Win32 Performance Measurement Options

Win32 provides six main timing functions for profiling. We'll analyze them and present six corresponding performance counter classes that wrap these functions. Also included is a template class that manipulates instances of any of the six timing classes in order to provide scoped timing operations.


May 01, 2003
URL:http://www.drdobbs.com/windows/win32-performance-measurement-options/184416651

Most real-world programs have performance requirements for profiling in order to determine where any bottlenecks may lie. Developers are notoriously bad at intuiting which parts of their code need optimization and which do not, and so are advised to profile their code before attempting optimizations. However, they are often left without adequate guidance as to the best way of determining accurate performance analysis data, or indeed what performance measurement functions are appropriate for a particular scenario. By knowing the costs and benefits of the available timing options, developers can better judge which profiling techniques to use, and this will help them to factor profiling side effects into their data.

The Win32 API provides a number of different functions for eliciting timing information that may be useful in determining performance metrics. These functions, and the timing information they provide, vary in call cost, resolution, accuracy, scope, and availability, and the choice of which should be used depends on a number of factors, particularly the requirements for efficiency, accuracy, and the targeted platform(s).

This article will describe the six main Win32 timing functions and present corresponding performance counter classes that wrap these functions. The classes provide a simple and consistent interface, allowing developers to swap timing functionality as needed. I'll also present a template class that manipulates instances of any of the six timing classes in order to provide scoped timing operations.

All measurement affects that which is being measured, and software profiling is no exception. Indeed, it is often the case that such profiling has a deleterious and misleading effect on the system being measured. I'll compare the various timing functions available by qualitative analyses of documented resolution and availability (OS support), and quantitative analyses of their practical resolutions (how accurately they measure intervals) and call costs (how much the act of measurement costs).

I will discuss the costs and benefits of each approach and offer advice on when each is most suitable, as well as highlighting some techniques for increasing accuracy by reducing the impact of the measurement process. Finally, a seventh performance counter class will be presented that provides an optimal blend of the examined timing functionality, attempting to use the high-resolution functions but defaulting to a less accurate, but ubiquitous, function when the best timer is not available.

Win32 API Timing Functions

The five Win32 timing functions provided by the base API (as implemented in KERNEL32.dll) are GetTickCount, GetSystemTime()/GetSystemTimeAsFileTime() (see the section "System Time"), QueryPerformanceCounter(), GetThreadTimes(), and GetProcessTimes() — these are shown in Table 1, along with the commonly used timeGetTime() function provided by the Windows MultiMedia API, in WINMM.DLL. (The KERNEL32 functions require linking to kernel32.lib, and timeGetTime() requires linking to winmm.lib.)


Table 1: Win32 timing functions.


GetTickCount() takes no argument, and simply returns the number of milliseconds that have elapsed since the system was started. GetTickCount() is the only timing function (that is used — see the section"System Time") that is provided by all operating systems and on all hardware. Table 2 lists the functions and their support on Windows 9x (95, 98, and Me), Windows NT (NT 3.5, 3.51, 4, 2000, and XP), and Windows CE operating systems. timeGetTime() has the same signature and semantics as GetTickCount(). On Windows 9x systems its resolution is 1ms, whereas on Windows NT systems it is usually 5ms or more, but can be modified by calling the timeBeginPeriod() function. In the tests described here, it was left at its default behavior.


Table 2: Functions and their support.


System Time

The documentation for GetSystemTimeAsFileTime() states that it is equivalent to consecutive calls to GetSystemTime() and SystemTimeToFileTime(), as in:

  void GetSystemTimeAsFileTime(LPFILETIME lpft)
  {
      SYSTEMTIME  st;

      GetSystemTime(&st);
      SystemTimeToFileTime(&st, lpft);
  }

While this is true from a functional perspective, it is certainly not the case that it is actually implemented in this way on all operating systems, as can be seen in Table 3.


Table 3: Call cost of system time functions (as percentage of GetSystemTimeAsFileTime()).


On Windows 98, the call costs are roughly equivalent. However, on all the NT-family operating systems, the cost of gleaning time in the form of an intermediate SYSTEMTIME is around 400 times that of GetSystemTimeAsFileTime().

While in almost all cases the multimedia timer offers no advantage over GetTickCount(), it still finds popular use since its measurement resolution is configurable. In addition, the fact that its resolution was 10 times better than GetTickCount() on one of the machines examined shows that it is worth having in one's toolbox. The timeGetSystemTime() function was not examined since its documentation states it has a higher cost than timeGetTime(). Also, its use would result in a more complicated class implementation.

GetSystemTime() retrieves the current system time and instantiates a SYSTEMTIME structure, which is composed of a number of separate fields including year, month, day, hours, minutes, seconds, and milliseconds. A peer function, GetSystemTimeAsFileTime(), retrieves the current system time in a single 64-bit argument (in the form of the Win32 FILETIME structure), measured in 100ns intervals. See the previous section "System Time" for a discussion of their implementation relationship.

If a system has a high-resolution counter, then QueryPerformanceCounter() may be used to obtain the current (64-bit) value of the high-performance counter, in the form of a Win32 LARGE_INTEGER structure. The value returned is the current count of the hardware counter and does not, in and of itself, represent a specific time unit. Because different hardware counters may use different counting frequencies, QueryPerformanceFrequency() must be called (once per host session) to determine the high-performance counter frequency in order to convert the performance counter values into time intervals. For example, if the high-performance counter frequency is 1,000,000, and two successive calls to QueryPerformanceCounter() yield a difference of 2000, then 2ms have elapsed. When no hardware counter is available, both QueryPerformanceCounter() and QueryPerformanceFrequency() return False. In practice, I have not encountered a laptop or desktop machine (running 9x or NT) on which a high-performance counter is not available.

Note that while I have not seen it documented that the value returned by QueryPerformanceFrequency() is fixed for a particular processor, I have never encountered a machine on which this does not hold true. Indeed, experiments showed that while the processor frequency for one of the laptops used in these tests is affected by running in battery mode, the performance frequency is unaffected (3,579,545 in both cases). I am, therefore, reasonably confident that this assumption holds in all cases.

GetTickCount(), timeGetTime(), GetSystemTime()/GetSystemTimeAsFileTime(), and QueryPerformanceCounter() all yield values on a systemwide basis. In other words, they measure absolute times on the system, so if the system has other busy processes, the measured values will reflect that activity. While it is commonly the case that one can run performance tests on a system where all other processes are in a quiescent state, sometimes it is not possible. Furthermore, it is sometimes desirable to get a finer-grained look into a process's activities, in terms of the individual performance costs of the kernel and user components.

On Windows NT operating systems, the GetThreadTimes() and GetProcessTimes() functions provide this information on a per-thread and per-process basis, respectively. These Win32 functions provide four 64-bit values (of type FILETIME) to the caller for the creation time, exit time, current kernel time, and current user time for the given thread/process, measured in 100ns intervals.

The Performance Counter Classes

The six classes presented here — tick_counter, multimedia_counter, systemtime_counter, highperformance_counter, threadtimes_counter, and processtimes_counter — are from the WinSTL performance library, and are based on the six Win32 timing functions described in Table 1. The essentials of each implementation are shown in Listings One, Two, Three, Four, Five, and Six.

Listing One: Extract from winstl_tick_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_tick_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

// Operations
inline void tick_counter::start()
{
    m_start = ::GetTickCount();
}

inline void tick_counter::stop()
{
    m_end = ::GetTickCount();
}

// Attributes
inline tick_counter::interval_type tick_counter::get_period_count() const
{
    return static_cast<interval_type>(m_end - m_start);
}

inline tick_counter::interval_type tick_counter::get_seconds() const
{
    return get_period_count() / interval_type(1000);
}

inline tick_counter::interval_type tick_counter::get_milliseconds() const
{
    return get_period_count();
}

inline tick_counter::interval_type tick_counter::get_microseconds() const
{
    return get_period_count() * interval_type(1000);
}

Listing Two: Extract from winstl_multimedia_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_multimedia_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

// Operations
inline void multimedia_counter::start()
{
    m_start = ::timeGetTime();
}

inline void multimedia_counter::stop()
{
    m_end = ::timeGetTime();
}

Listing Three: Extract from winstl_systemtime_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_systemtime_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

// Operations
inline void systemtime_counter::start()
{
    ::GetSystemTimeAsFileTime(reinterpret_cast<LPFILETIME>(&m_start));
}

inline void systemtime_counter::stop()
{
    ::GetSystemTimeAsFileTime(reinterpret_cast<LPFILETIME>(&m_end));
}

// Attributes
inline systemtime_counter::interval_type systemtime_counter::get_seconds() const
{
    return get_period_count() / interval_type(10000000);
}

inline systemtime_counter::interval_type systemtime_counter::get_milliseconds() const
{
    return get_period_count() / interval_type(10000);
}

inline systemtime_counter::interval_type systemtime_counter::get_microseconds() const
{
    return get_period_count() / interval_type(10);
}

Listing Four: Extract from winstl_highperformance_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_highperformance_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

inline /* static */ highperformance_counter::interval_type highperformance_counter::_query_frequency()
{
    interval_type   frequency;

    // If no high-performance counter is available ...
    if( !::QueryPerformanceFrequency(reinterpret_cast<LARGE_INTEGER*> (&frequency)) ||
        frequency == 0)
    {
        // ... then set the divisor to be the maximum value, guaranteeing that 
        // the timed periods will always evaluate to 0.
        frequency = stlsoft_ns_qual(limit_traits)<interval_type>::maximum();
    }

    return frequency;
}

inline /* static */ highperformance_counter::interval_type highperformance_counter::_frequency()
{
    static interval_type  s_frequency = _query_frequency();

    return s_frequency;
}

// Operations
inline void highperformance_counter::start()
{
    ::QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&m_start));
}

inline void highperformance_counter::stop()
{
    ::QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&m_end));
}

// Attributes
inline highperformance_counter::interval_type highperformance_counter::get_seconds() const
{
    return get_period_count() / _frequency();
}

inline highperformance_counter::interval_type highperformance_counter::get_milliseconds() const
{
    highperformance_counter::interval_type  result;
    highperformance_counter::interval_type  count   =   get_period_count();

    if(count < __STLSOFT_GEN_SINT64_SUFFIX(0x20C49BA5E353F7))
    {
        result = (count * interval_type(1000)) / _frequency();
    }
    else
    {
        result = (count / _frequency()) * interval_type(1000);
    }

    return result;
}

inline highperformance_counter::interval_type highperformance_counter::get_microseconds() const
{
    highperformance_counter::interval_type  result;
    highperformance_counter::interval_type  count   =   get_period_count();

    if(count < __STLSOFT_GEN_SINT64_SUFFIX(0x8637BD05AF6))
    {
        result = (count * interval_type(1000000)) / _frequency();
    }
    else
    {
        result = (count / _frequency()) * interval_type(1000000);
    }

    return result;
}


Listing Five: Extract from winstl_threadtimes_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_threadtimes_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

inline threadtimes_counter::threadtimes_counter()
    : m_thread(::GetCurrentThread())
{
}

// Operations
inline void threadtimes_counter::start()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetThreadTimes(   m_thread,
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelStart),
                        reinterpret_cast<LPFILETIME>(&m_userStart));
}

inline void threadtimes_counter::stop()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetThreadTimes(   m_thread,
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelEnd),
                        reinterpret_cast<LPFILETIME>(&m_userEnd));
}

// Attributes

// Kernel
inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_period_count() const
{
    return static_cast<interval_type>(m_kernelEnd - m_kernelStart);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_seconds() const
{
    return get_kernel_period_count() / interval_type(10000000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_milliseconds() const
{
    return get_kernel_period_count() / interval_type(10000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_microseconds() const
{
    return get_kernel_period_count() / interval_type(10);
}

// User
inline threadtimes_counter::interval_type threadtimes_counter::get_user_period_count() const
{
    return static_cast<interval_type>(m_userEnd - m_userStart);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_user_seconds() const
{
    return get_user_period_count() / interval_type(10000000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_user_milliseconds() const
{
    return get_user_period_count() / interval_type(10000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_user_microseconds() const
{
    return get_user_period_count() / interval_type(10);
}

// Total
inline threadtimes_counter::interval_type threadtimes_counter::get_period_count() const
{
    return get_kernel_period_count() + get_user_period_count();
}

inline threadtimes_counter::interval_type threadtimes_counter::get_seconds() const
{
    return get_period_count() / interval_type(10000000);
}

inline threadtimes_counter::interval_type    threadtimes_counter::get_milliseconds() const
{
    return get_period_count() / interval_type(10000);
}

inline threadtimes_counter::interval_type    threadtimes_counter::get_microseconds() const
{
    return get_period_count() / interval_type(10);
}

Listing Six: Extract from winstl_processtimes_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_processtimes_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

inline /* static */ HANDLE processtimes_counter::_get_process_handle()
{
	static HANDLE	s_hProcess	=	::GetCurrentProcess();

	return s_hProcess;
}

// Operations
inline void processtimes_counter::start()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetProcessTimes(  _get_process_handle(),
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelStart),
                        reinterpret_cast<LPFILETIME>(&m_userStart));
}

inline void processtimes_counter::stop()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetProcessTimes(  _get_process_handle(),
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelEnd),
                        reinterpret_cast<LPFILETIME>(&m_userEnd));
}

The full implementations are provided in the archive and are available in their most up-to-date form online at http://winstl.org/.) They all have a similar form and semantics according to the following format:

  class xxx_counter
  {
  public:
    ...

    typedef ws_sint64_t epoch_type;
    typedef ws_sint64_t interval_type;

  // Operations
  public:
    void start();
    void stop();

  // Attributes
  public:
    interval_type get_period_count() const;
    interval_type get_seconds() const;
    interval_type get_milliseconds() const;
    interval_type get_microseconds() const;

    ...
};

By providing the same interface, they can easily be substituted (either by a single typedef change or as a result of preprocessor environment discrimination) to suit the needs of the program(mer).

The start() method causes the first timing instant to be recorded, and the stop() method causes the second timing instant to be recorded. start() and stop() can be called multiple times, allowing staged timings, although obviously you will get nonsense values from the period attributes if start() is called after calling stop(). (Indeed, this is the reason that the interval types are signed, so that such values are negative and can, therefore, be more easily spotted.) Each of the classes calculates the elapsed time from the difference between these two instant values.

The elapsed time for the measured period is provided by each class in units of seconds, milliseconds, and microseconds via the get_seconds(), get_milliseconds(), and get_microseconds() methods, respectively. The resolution of the return values from these methods depends on the underlying timing function; i.e., tick_counter's get_microseconds() will always return exactly 1000 times the value returned by get_milliseconds(), since GetTickCount()'s measurement resolution is (at best) 1 millisecond.

Each class also provides the get_period_ count() method, which returns the extent of the elapsed period — in timing function-specific increments — by calculating the difference between the start and stop instant values. This can be of use when doing relative performance measures, since this method generally has a lower performance cost than any of the elapsed time methods (because most of them have to perform additional multiplications/divisions in order to convert into time units).

The methods of all the classes are implemented inline for maximum efficiency. (Examination of the generated object code has shown that the inlining is carried out, and there is no significant additional overhead when using the class implementations over the Win32 functions directly.) Furthermore, having all the methods as inline simplifies use of the library since there are no implementation files to compile and link. Where pertinent, late-evaluation (also known as lazy-evaluation) techniques and static members are used so that the costs of calls (such as to GetCurrentProcess()) are only incurred once, and only when their information is actually needed.

tick_counter and multimedia_counter — tick_ counter and multimedia_counter record the 32-bit unsigned values returned by GetTickCount() and timeGetTime(), respectively, in the start() and stop() methods into their m_start and m_end members. get_milliseconds() simply returns get_period_count(), get_microseconds() returns get_period_count() multiplied by 1000, and get_seconds() returns get_period_count() divided by 1000.

systemtime_counter — systemtime_counter records the FILETIME value obtained from GetSystemTimeAsFileTime() in its start() and stop() methods, converting to ws_sint64_t (see the section "Win32 64-Bit Integers"). get_period_count() returns a value in 100ns increments, so get_seconds(), get_milliseconds(), and get_microseconds() are implemented to return this value divided by 10,000,000, 10000, and 10, respectively. GetSystemTimeAsFileTime() is preferred over GetSystemTime() (since it exists on all platforms save CE), is far more efficient on NT, and affords a simple and cleaner implementation of the class.

Win32 64-Bit Integers

The structure layouts of LARGE_INTEGER and FILETIME are such that it is safe to cast them to and from a 64-bit integer — ws_sint64_t (defined to be signed __int64 for Borland, Digital Mars, Visual C++ and Watcom compilers, and signed long long for Comeau, GCC, and Metrowerks compilers) — so long as the platform is little-endian (i.e., Intel), since the LowPart/dwLowDateTime member preceeds the HighPart/dwHighDateTime member. I do not have access to the Win32 headers for any big-endian systems (i.e., PowerPC, ALPHA), so I cannot presume that the layout of the structure members would be reversed (although I hope they would be), which would maintain the compatibility with the 64-bit integers. If they are not, then the systemtime_counter, highperformance_counter, threadtimes_ counter, processtimes_counter, and highperformance_ counter classes are not valid for systems that are not little-endian.

highperformance_counter — highperformance_counter records the LARGE_INTEGER values obtained from QueryPerformanceCounter() in its start() and stop() methods, converting to ws_sint64_t (see the previous section "Win32 64-Bit Integers"). get_seconds() is implemented by dividing the value returned from get_period_count() by the frequency (obtained from QueryPerformanceFrequency()). This frequency is hardware dependent, but is commonly the processor frequency or a small factor thereof. The _frequency() method obtains the frequency via a once-only (since the s_frequency variable is static) call to _query_frequency(). _query_frequency() is implemented such that if QueryPerformanceFrequency returns False, indicating the absence of high-performance counter support, the value returned is the maximum value for its type, so that future divisions will evaluate to 0, rather than crashing on a divide-by-zero error.

get_milliseconds() and get_microseconds() are implemented by multiplying get_ period_count() by 1000 and 1,000,000, respectively, and dividing by the frequency. In order to avoid truncation of the result when the period_count is low, or overflow when it is high, the multiplication is carried out first if overflow will not occur, and afterwards if it will.

Since it is required only for calculating the period, rather than measuring it, I employed a combination of late evaluation and statics to defer the expensive call to QueryPerformanceFrequency() until after the measurement is complete, as well as, of course, only doing it once per process. Indeed, if you only use get_period_count() — and not get_seconds(), get_milliseconds(), or get_microseconds() — then this cost is not incurred at all.

processtimes_counter and threadtimes_ counter — As well as providing the four period methods that all the other counter classes provide, these two classes also provide four corresponding methods each for kernel time and user time. processtimes_counter and threadtimes_counter record the values for kernel and user time obtained from GetProcessTimes() and GetThreadTimes(), respectively, in their start() and stop() methods into the m_kernelStart, m_kernelEnd, m_userStart, and m_userEnd members (casting in the same way as in systemtime_counter). In addition to this being a finer-grained level of measurement, the figures obtained from this class are not affected by other processes, which is the case for the three other classes. (I have included a program, counter_isolation, in the archive that demonstrates this behavior for the threadtimes_counter class).

The thread/process handles are specified as the current ones, via GetCurrentThread()/ GetCurrentProcess(). threadtimes_counter records the current thread handle in a member variable to avoid unnecessarily repeating this small but nonzero cost. processtimes_ counter uses a static technique such that GetCurrentProcess() is called only once per process. The creation time and exit time values obtained from GetThreadTimes()/GetProcessTimes() are ignored. (They are, in fact, fixed values, and the exit time is not actually valid until the given thread has exited.)

These two classes have the following attribute methods in addition to those they share with the three other classes:

...

  // Attributes
  public:
    ...

    interval_type get_kernel_period_count() const;
    interval_type get_kernel_seconds() const;
    interval_type get_kernel_milliseconds() const;
    interval_type get_kernel_microseconds() const;
    interval_type get_user_period_count() const;
    interval_type get_user_seconds() const;
    interval_type get_user_milliseconds() const;
    interval_type get_user_microseconds() const;

    ...
};

get_kernel_period_count() and get_user_ period_count() are implemented as returning the difference of the kernel members and user members, respectively. The implementation of get_period_count() is as the sum of get_kernel_period_count() and get_user_ period_count(). The calculations of all the seconds, milliseconds, and microseconds are performed in the same way as those of systemtime_counter.

counter_scope — The similar public interface to each class facilitates the use of a scoping template class, performance_counter_scope (shown in Listing Seven) — implementing the "Resource Acquisition Is Initialization" idiom — which may be parameterized on a particular counter class. The constructor takes a reference to a counter class instance, and then calls start(). stop() is called in the destructor, providing a scoped timing operation. It also provides access to the stop() method in order to support intermediate staged timings, and a reference to const of the managed counter class such that intermediate timing values can be obtained. An example of its use is shown in Listing Eight.

Listing Seven: Extract from winstl_performance_counter_scope.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_performance_counter_scope.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

// class performance_counter_scope
template <ws_typename_param_k T>
class performance_counter_scope
{
public:
    typedef T                               counter_type;
    typedef performance_counter_scope<T>    class_type;

public:
    ws_explicit_k performance_counter_scope(counter_type &counter)
        : m_counter(counter)
    {
        m_counter.start();
    }
    ~performance_counter_scope()
    {
        m_counter.stop();
    }

    void stop()
    {
        m_counter.stop();
    }

    // This method is const, to ensure that only the stop operation 
    // (via performance_counter_scope::stop()) is accessible 
    // on the managed counter.
    const counter_type &get_counter() const
    {
        return m_counter;
    }

// Members
protected:
    T   &m_counter;

// Not to be implemented
private:
    performance_counter_scope(class_type const &rhs);
    class_type const &operator =(class_type const &rhs);
};
Listing Eight: Extract from counter_cost.cpp


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from counter_cost.cpp
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

#include <stdio.h>

#define _WINSTL_NO_NAMESPACES

#include <winstl.h>
#include <winstl_tick_counter.h>
#include <winstl_multimedia_counter.h>
#include <winstl_highperformance_counter.h>
#include <winstl_systemtime_counter.h>
#include <winstl_threadtimes_counter.h>
#include <winstl_processtimes_counter.h>
#include <winstl_performance_counter.h>
#include <winstl_performance_counter_scope.h>
#include <winstl_performance_counter_init.h>

/* //////////////////////////////////////////////////////////// */

const int   C_ITERATIONS    =   1000000;

/* //////////////////////////////////////////////////////////// */

typedef highperformance_counter application_counter_type;

template<
            ws_typename_param_k C1
        ,   ws_typename_param_k C2
        >
inline ws_typename_type_k C1::interval_type
test_cost(C1 &app_counter, C2 &counter)
{
  for(int i = 0; i < 2; ++i)
  {
    performance_counter_scope<C1>   scope(app_counter);

    for(int j = 0; j < C_ITERATIONS; ++j)
    {
      counter.start();
      counter.stop();
    }
  }

  return app_counter.get_milliseconds();
}

int main(int /* argc */, char* /* argv */[])
{
  performance_counter_init<application_counter_type>  app_counter;

#if defined(_STLSOFT_COMPILER_IS_BORLAND) || \
    defined(_STLSOFT_COMPILER_IS_INTEL) || \
    defined(_STLSOFT_COMPILER_IS_MSVC)    
 #define _counter_test_fmt	"%I64d"
#else
 #define _counter_test_fmt	"%lld"
#endif /* compiler */

#define _test_counter(_x)   \
  do \
  { \
    _x x; \
   \
    printf( #_x ": " _counter_test_fmt "us\n", \
            test_cost(app_counter, x)); \
  } \
  while(0)

  _test_counter(tick_counter);
  _test_counter(multimedia_counter);
  _test_counter(systemtime_counter);
  _test_counter(highperformance_counter);
  _test_counter(threadtimes_counter);
  _test_counter(processtimes_counter);
  _test_counter(performance_counter);

  return 0;
}

The original proprietary implementations of the performance classes called their start() methods in their constructors, as well as initializing their member variables, as a syntactic convenience, such that the following would produce meaningful results:

  performance_counter counter;

  some_operation();
  counter.stop();

  printf("...", counter.get_xxx());

However, the observed use of these classes — in almost all cases — along with the strong requirement for them to be as efficient as possible, has shown this to be a mistake. Because instances are often used in a number of start()-stop() cycles, as can be seen in the test program, having start() called in the constructor complicates the semantics for no net benefit. Nor does it ensure that the instance has a coherent state, since only when a subsequent stop() call is made do the attribute calls have well-defined behavior (see the section "Initialized Counters").

Performance Analysis

The test scenarios described here were executed on the following platforms: Windows 98 (233 MHz), NT 4 (400 MHz), 2000 (650-MHz laptop), 2000 (dual 550 MHz), NT 4 (dual 933 MHz), 2000 (dual 933 MHz), and XP (2 GHz). (All program code and supporting files are included in the archive, along with Visual C++ 6 and Metrowerks CodeWarrior 8 projects.)

Initialized Counters

If you need to have your counter class initialized to meaningful values from the point of its construction, you can derive from the one you are interested in, and call its start() and stop() members in the constructor of your derived class. Alternatively, you can use the WinSTL class performance_counter_init template:

template <class C>
class performance_counter_init
  : public C
{
public:
  typedef C     counter_type;

// Conclusion
public:
  performance_counter_init()
  {
    counter_type  &counter  = *this;

    counter.start();
    counter.stop();
  }
};

which basically does this for you for any class on which you parameterize it, as in:

  performance_counter_init<tick_counter>  counter;

  some_timed_operation();

  counter.stop();

  dump(counter.get_milliseconds());

Call Costs

Any measurements on a system affect the behavior being measured. Therefore, an important characteristic of the performance classes (and their underlying timing functions) is the cost of the timing function calls. The first part of the analysis is to quantify the call costs of the functions.

Listing Eight shows the essentials of the counter_cost application. For each of the counter classes, the template test_cost() function is called, and the returned timing results, representing the total call costs, are printed to stdout.

The test_cost() function takes the form of an outer loop (which is executed twice in order to eliminate any caching effects, and the value of the second iteration is used), and an inner loop within which start() and stop() are called 1,000,000 times on an instance of the counter type being examined. The main application counter (which is an instance of highperformance_counter) measures the cost of the inner loop using the performance_counter_scope template.

Because the operating systems are on machines with widely different hardware, comparisons of the actual time costs over different systems are not meaningful. Since the call costs of GetTickCount() were lower than those of any other timing function (except GetSystemTimeAsFileTime() on XP), the results are expressed as a percentage of the GetTickCount() time on each platform to provide meaningful comparisons. The results are shown in Table 4.

The results clearly demonstrate that GetTickCount() has the lowest performance cost on all operating systems, except the single case of GetSystemTimeAsFileTime() on XP. Also clear is the fact that timeGetTime() costs between four and 69 times that of GetTickCount().


Table 4: Call cost of timing functions (as percentage of GetTickCount()).


On NT operating systems, GetSystemTimeAsFileTime() has barely any additional cost over GetTickCount(). It is also notable that GetSystemTimeAsFileTime() has a relatively better performance on later operating-system variants. However, on Windows 98, this call has an exceedingly high cost, nearly 8000 times that of GetTickCount(). QueryPerformanceCounter() has a high call cost on all operating systems, ranging from 49 to 2080 times that of GetTickCount().

The cost of GetThreadTimes() and GetProcessTimes() is very consistent over all flavors of NT operating systems (between 296 and 924 times that of GetTickCount()). Note that the figures are not shown for Windows 98, since these two functions are not implemented on 9x.

One final point is that QueryPerformanceCounter has a higher cost than GetThreadTimes()/GetProcessTimes() on single processor machines, but lower on multiprocessor machines. Presumably this is because access to the thread/system time infrastructure on multiprocessor machines requires synchronization, and that to the performance counter hardware does not.

Call Resolution

The other characteristic examined is that of the resolution of the various timing functions. Their documented resolutions are listed in Table 5. The second part of the analysis quantifies the actual resolutions of the functions.


Table 5: Resolution of timing functions.


Listing Nine shows the implementation of the counter_resolution application. For each of the counter classes, the test_resolution() template function is called, and the returned results, representing the minimum measured resolution for the counter class, are printed to stdout.

Listing Nine: Extract from counter_resolution.cpp


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from counter_resolution.cpp
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

#include <stdio.h>

#define _WINSTL_NO_NAMESPACES

#include <winstl.h>
#include <winstl_tick_counter.h>
#include <winstl_multimedia_counter.h>
#include <winstl_systemtime_counter.h>
#include <winstl_highperformance_counter.h>
#include <winstl_threadtimes_counter.h>
#include <winstl_processtimes_counter.h>
#include <winstl_performance_counter.h>

#include <stlsoft_limit_traits.h>

/* ////////////////////////////////////////////////////////////////////// */

const int   C_ITERATIONS    =   1000000;

/* ////////////////////////////////////////////////////////////////////// */

template <ws_typename_param_k C>
inline ws_typename_type_k C::interval_type test_resolution(C &counter)
{
  typedef ws_typename_type_k C::interval_type interval_type;

  interval_type   min_inc = stlsoft::limit_traits<interval_type>::maximum();

  for(volatile int i = 0; i < C_ITERATIONS; ++i)
  {
    counter.start();

    // Execute a short inner loop, capping at 2048 repeats
    for(volatile int j = 0; j < (i & 0x7ff); ++j)
    {}

    counter.stop();

    interval_type   interval = counter.get_microseconds();

    if( interval != 0 &&
        interval < min_inc)
    {
      min_inc = interval;
    }
  }

  return min_inc;
}

int main(int /* argc */, char* /* argv */[])
{

#if defined(_STLSOFT_COMPILER_IS_BORLAND) || \
    defined(_STLSOFT_COMPILER_IS_INTEL) || \
    defined(_STLSOFT_COMPILER_IS_MSVC)    
 #define _counter_test_fmt	"%I64d"
#else
 #define _counter_test_fmt	"%lld"
#endif /* compiler */

#define _test_counter(_x)   \
  do \
  { \
    _x x; \
   \
    printf( #_x ": " _counter_test_fmt "us\n", \
            test_resolution(x)); \
  } \
  while(0)

  _test_counter(tick_counter);
  _test_counter(multimedia_counter);
  _test_counter(systemtime_counter);
  _test_counter(highperformance_counter);
  _test_counter(threadtimes_counter);
  _test_counter(processtimes_counter);
  _test_counter(performance_counter);

  return 0;
}

The test_resolution() function takes the form of an outer loop, which executes 100,000 times. Within that loop, an inner loop of a limited maximum 2048 iterations is executed, and its execution time measured. The minimum nonzero (since it is likely that some intervals will be reported to be 0) interval is recorded, and returned as the result of the function. The results are shown in Table 5.

The results mainly illustrate that every timing function save QueryPerformanceCounter() (between 1_s and 5_s) has a significantly lower actual resolution than stated. The three exceptions are GetTickCount() and timeGetTime() on Windows 98, and timeGetTime() on one particular dual-processor Windows 2000 machine (though the other SMP 2000 machine does not show this). In all other cases, the best resolution ranges from 10ms to 20ms.

It is also interesting to note that for most machines, the resolutions obtainable from GetThreadTimes(), GetProcessTimes(), GetSystemTimeAsFileTime(), and timeGetTime() are (roughly) equivalent to that of GetTickCount(), suggesting that all these functions derive their timing information from a common low-resolution source.

No Single Solution

A summary of the characteristics of the counter classes (and their underlying timing functions) is given in Table 6. The first thing to note is that none is an out and out winner in every conceivable scenario. As mentioned in the introduction, the selection of a particular measurement function (or class) depends not only on its availability on your targeted platform(s) and on the type or measurement (systemwide/per-process/per-thread), but also on the actual resolution of the measurement and on its cost.


Table 6: Advantages and disadvantages of Win32 timing functions.


If you want user and/or kernel timings, then you must use either the threadtimes_counter or processtimes_counter classes, but these are only functional on NT operating systems.

If you want timings that give useful results on busy systems, again the threadtimes_counter or processtimes_counter classes are your choice. Since systems where you cannot suspend or terminate other busy processes are most likely to be high-performance servers, the specificity to NT systems is unlikely to be a problem.

If you want high timing resolutions, then you must use the highperformance_counter class. This does have a high call cost, but has the highest resolution by far. In addition, it appears that the call cost is relatively lower on newer operating systems, so the dissuasively high costs seen in Windows NT 4 are likely to be less and less significant in the future. (Note that highperformance_ counter has a wrap time of 100+ years. Each time the processor speed doubles, the wrapping time will halve, so when we have 1-THz machines, we will have to worry about catching the wrapping.)

If minimal call cost is the most important factor, tick_counter or multimedia_counter should be used, but be aware that they may wrap on a system that has been active, especially if it has been suspended: The value continues to be incremented when a machine is suspended. A simple program that demonstrates this is:

  void main()
  {
    DWORD const dw = ::GetTickCount();

    for(; ; ::Sleep(500))
    {
      printf("%d\n", ::GetTickCount() - dw);
    }
  }

On NT operating systems, systemtime_ counter is almost as low cost, and it does not have the wrap problem.

If you require support on every operating system, without the use of any dispatching by either the precompiler or at run time, then you must use tick_counter.

Overall, the choice of a class depends on the circumstances in which it is to be used. Hopefully, the information in the article should be of use when making such assessments, and the two programs described may be executed on your target system(s) to provide more detailed information.

performance_counter

Because no single class provides the best solutions in all cases, a seventh counter class, performance_counter, is provided, which has the functionality of highperformance_counter where a high-performance hardware counter is available, otherwise defaulting to that provided by tick_counter. Its implementation is shown in Listing Ten. It also uses the late-evaluation and statics techniques to work out (one time only) whether the hardware counter support is present.

Listing Ten: Extract from winstl_performance_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_performance_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

inline /* static */ performance_counter::interval_type performance_counter::_query_frequency()
{
    interval_type   frequency;

    // If no high-performance counter is available ...
    if( !::QueryPerformanceFrequency(reinterpret_cast<LARGE_INTEGER*>
        (&frequency)) ||
        frequency == 0)
    {
        // ... then set the divisor to be the frequency for GetTickCount(), 
        // which is 1000 since it returns intervals in milliseconds.
        frequency = 1000;
    }

    return frequency;
}

inline /* static */ performance_counter::interval_type performance_counter::_frequency()
{
    static interval_type    s_frequency = _query_frequency();

    return s_frequency;
}

inline /* static */ void performance_counter::_qpc(epoch_type &epoch)
{
    ::QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&epoch));
}

inline /* static */ void performance_counter::_gtc(epoch_type &epoch)
{
    epoch = ::GetTickCount();
}

inline /* static */ performance_counter::measure_fn_type 
  performance_counter::_get_measure_fn()
{
    measure_fn_type fn;
    epoch_type      frequency;

    if(QueryPerformanceFrequency(reinterpret_cast<LARGE_INTEGER*>(&frequency)))
    {
        fn = _qpc;
    }
    else
    {
        fn = _gtc;
    }

    return fn;
}

inline /* static */ void performance_counter::_measure(epoch_type &epoch)
{
    static measure_fn_type  fn  =   _get_measure_fn();

    fn(epoch);
}

// Operations
inline void performance_counter::start()
{
    _measure(m_start);
}

inline void performance_counter::stop()
{
    _measure(m_end);
}

// Attributes
inline performance_counter::interval_type performance_counter::get_period_count()
   const
{
    return static_cast<interval_type>(m_end - m_start);
}

inline performance_counter::interval_type performance_counter::get_seconds() 
   const
{
    return get_period_count() / _frequency();
}

inline performance_counter::interval_type performance_counter::get_milliseconds() 
   const
{
    interval_type   result;
    interval_type   count   =   get_period_count();

    if(count < __STLSOFT_GEN_SINT64_SUFFIX(0x20C49BA5E353F7))
    {
        result = (count * interval_type(1000)) / _frequency();
    }
    else
    {
        result = (count / _frequency()) * interval_type(1000);
    }

    return result;
}

inline performance_counter::interval_type performance_counter::get_microseconds() 
   const
{
    interval_type   result;
    interval_type   count   =   get_period_count();

    if(count < __STLSOFT_GEN_SINT64_SUFFIX(0x8637BD05AF6))
    {
        result = (count * interval_type(1000000)) / _frequency();
    }
    else
    {
        result = (count / _frequency()) * interval_type(1000000);
    }

    return result;
}

Despite having to call the underlying timing functions via an additional indirection, the call costs of this class range from 101-106 percent of that of the performance_counter class over the range of systems used in this analysis.

A final point worth remembering is that if you do not need absolute times, only relative ones, then you should just call get_period_count() on instances of this, or any other, counter class.

References

Java 2 Performance and Idiom Guide, Craig Larman & Rhett Guthrie, Prentice-Hall PTR, 2000.

More Effective C++, Scott Meyers, Addison-Wesley, 1996.

More Exceptional C++, Herb Sutter, Addison-Wesley, 2002.

The five original classes — TickCounter, PerformanceCounter, SystemTimer, ThreadTimes, and ProcessTimes — were developed by my employer Synesis Software (http://synesis.com.au). They have been donated, and reworked somewhat, to form part of the WinSTL open-source project, which aims to apply STL programming techniques to the Win32 API in the form of a robust, lightweight, header-only library (http://winstl.org/).


Matthew Wilson holds a degree in Information Technology and a Ph.D. in Electrical Engineering, and is a software-development consultant for Synesis Software. Matthew's work interests are in writing bulletproof real-time, GUI, and software-analysis software in C, C++, and Java. He has been working with C++ for over 10 years, and is currently bringing STLSoft.org and its offshoots into the public domain. Matthew can be contacted via [email protected] or at http://stlsoft.org/. Win32 Performance Measurement Options

Listing 4 Extract from winstl_highperformance_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_highperformance_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

inline /* static */ highperformance_counter::interval_type highperformance_counter::_query_frequency()
{
    interval_type   frequency;

    // If no high-performance counter is available ...
    if( !::QueryPerformanceFrequency(reinterpret_cast<LARGE_INTEGER*> (&frequency)) ||
        frequency == 0)
    {
        // ... then set the divisor to be the maximum value, guaranteeing that 
        // the timed periods will always evaluate to 0.
        frequency = stlsoft_ns_qual(limit_traits)<interval_type>::maximum();
    }

    return frequency;
}

inline /* static */ highperformance_counter::interval_type highperformance_counter::_frequency()
{
    static interval_type  s_frequency = _query_frequency();

    return s_frequency;
}

// Operations
inline void highperformance_counter::start()
{
    ::QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&m_start));
}

inline void highperformance_counter::stop()
{
    ::QueryPerformanceCounter(reinterpret_cast<LARGE_INTEGER*>(&m_end));
}

// Attributes
inline highperformance_counter::interval_type highperformance_counter::get_seconds() const
{
    return get_period_count() / _frequency();
}

inline highperformance_counter::interval_type highperformance_counter::get_milliseconds() const
{
    highperformance_counter::interval_type  result;
    highperformance_counter::interval_type  count   =   get_period_count();

    if(count < __STLSOFT_GEN_SINT64_SUFFIX(0x20C49BA5E353F7))
    {
        result = (count * interval_type(1000)) / _frequency();
    }
    else
    {
        result = (count / _frequency()) * interval_type(1000);
    }

    return result;
}

inline highperformance_counter::interval_type highperformance_counter::get_microseconds() const
{
    highperformance_counter::interval_type  result;
    highperformance_counter::interval_type  count   =   get_period_count();

    if(count < __STLSOFT_GEN_SINT64_SUFFIX(0x8637BD05AF6))
    {
        result = (count * interval_type(1000000)) / _frequency();
    }
    else
    {
        result = (count / _frequency()) * interval_type(1000000);
    }

    return result;
}


Listing 5 Extract from winstl_threadtimes_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_threadtimes_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

inline threadtimes_counter::threadtimes_counter()
    : m_thread(::GetCurrentThread())
{
}

// Operations
inline void threadtimes_counter::start()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetThreadTimes(   m_thread,
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelStart),
                        reinterpret_cast<LPFILETIME>(&m_userStart));
}

inline void threadtimes_counter::stop()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetThreadTimes(   m_thread,
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelEnd),
                        reinterpret_cast<LPFILETIME>(&m_userEnd));
}

// Attributes

// Kernel
inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_period_count() const
{
    return static_cast<interval_type>(m_kernelEnd - m_kernelStart);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_seconds() const
{
    return get_kernel_period_count() / interval_type(10000000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_milliseconds() const
{
    return get_kernel_period_count() / interval_type(10000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_kernel_microseconds() const
{
    return get_kernel_period_count() / interval_type(10);
}

// User
inline threadtimes_counter::interval_type threadtimes_counter::get_user_period_count() const
{
    return static_cast<interval_type>(m_userEnd - m_userStart);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_user_seconds() const
{
    return get_user_period_count() / interval_type(10000000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_user_milliseconds() const
{
    return get_user_period_count() / interval_type(10000);
}

inline threadtimes_counter::interval_type threadtimes_counter::get_user_microseconds() const
{
    return get_user_period_count() / interval_type(10);
}

// Total
inline threadtimes_counter::interval_type threadtimes_counter::get_period_count() const
{
    return get_kernel_period_count() + get_user_period_count();
}

inline threadtimes_counter::interval_type threadtimes_counter::get_seconds() const
{
    return get_period_count() / interval_type(10000000);
}

inline threadtimes_counter::interval_type    threadtimes_counter::get_milliseconds() const
{
    return get_period_count() / interval_type(10000);
}

inline threadtimes_counter::interval_type    threadtimes_counter::get_microseconds() const
{
    return get_period_count() / interval_type(10);
}

Listing 6 Extract from winstl_processtimes_counter.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_processtimes_counter.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

inline /* static */ HANDLE processtimes_counter::_get_process_handle()
{
	static HANDLE	s_hProcess	=	::GetCurrentProcess();

	return s_hProcess;
}

// Operations
inline void processtimes_counter::start()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetProcessTimes(  _get_process_handle(),
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelStart),
                        reinterpret_cast<LPFILETIME>(&m_userStart));
}

inline void processtimes_counter::stop()
{
    FILETIME    creationTime;
    FILETIME    exitTime;

    ::GetProcessTimes(  _get_process_handle(),
                        &creationTime,
                        &exitTime,
                        reinterpret_cast<LPFILETIME>(&m_kernelEnd),
                        reinterpret_cast<LPFILETIME>(&m_userEnd));
}

Listing 7 Extract from winstl_performance_counter_scope.h


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from winstl_performance_counter_scope.h
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

// class performance_counter_scope
template <ws_typename_param_k T>
class performance_counter_scope
{
public:
    typedef T                               counter_type;
    typedef performance_counter_scope<T>    class_type;

public:
    ws_explicit_k performance_counter_scope(counter_type &counter)
        : m_counter(counter)
    {
        m_counter.start();
    }
    ~performance_counter_scope()
    {
        m_counter.stop();
    }

    void stop()
    {
        m_counter.stop();
    }

    // This method is const, to ensure that only the stop operation 
    // (via performance_counter_scope::stop()) is accessible 
    // on the managed counter.
    const counter_type &get_counter() const
    {
        return m_counter;
    }

// Members
protected:
    T   &m_counter;

// Not to be implemented
private:
    performance_counter_scope(class_type const &rhs);
    class_type const &operator =(class_type const &rhs);
};

Listing 8 Extract from counter_cost.cpp


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from counter_cost.cpp
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

#include <stdio.h>

#define _WINSTL_NO_NAMESPACES

#include <winstl.h>
#include <winstl_tick_counter.h>
#include <winstl_multimedia_counter.h>
#include <winstl_highperformance_counter.h>
#include <winstl_systemtime_counter.h>
#include <winstl_threadtimes_counter.h>
#include <winstl_processtimes_counter.h>
#include <winstl_performance_counter.h>
#include <winstl_performance_counter_scope.h>
#include <winstl_performance_counter_init.h>

/* //////////////////////////////////////////////////////////// */

const int   C_ITERATIONS    =   1000000;

/* //////////////////////////////////////////////////////////// */

typedef highperformance_counter application_counter_type;

template<
            ws_typename_param_k C1
        ,   ws_typename_param_k C2
        >
inline ws_typename_type_k C1::interval_type
test_cost(C1 &app_counter, C2 &counter)
{
  for(int i = 0; i < 2; ++i)
  {
    performance_counter_scope<C1>   scope(app_counter);

    for(int j = 0; j < C_ITERATIONS; ++j)
    {
      counter.start();
      counter.stop();
    }
  }

  return app_counter.get_milliseconds();
}

int main(int /* argc */, char* /* argv */[])
{
  performance_counter_init<application_counter_type>  app_counter;

#if defined(_STLSOFT_COMPILER_IS_BORLAND) || \
    defined(_STLSOFT_COMPILER_IS_INTEL) || \
    defined(_STLSOFT_COMPILER_IS_MSVC)    
 #define _counter_test_fmt	"%I64d"
#else
 #define _counter_test_fmt	"%lld"
#endif /* compiler */

#define _test_counter(_x)   \
  do \
  { \
    _x x; \
   \
    printf( #_x ": " _counter_test_fmt "us\n", \
            test_cost(app_counter, x)); \
  } \
  while(0)

  _test_counter(tick_counter);
  _test_counter(multimedia_counter);
  _test_counter(systemtime_counter);
  _test_counter(highperformance_counter);
  _test_counter(threadtimes_counter);
  _test_counter(processtimes_counter);
  _test_counter(performance_counter);

  return 0;
}

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.