High-Resolution Timing

For many applications, the PC's built-in timer just isn't fast enough. Thomas untangles the PC timer, then offers up three externally referenced timer functions--along with a program that verifies them--that provide high-resolution timing.

September 01, 1992
URL:http://www.drdobbs.com/parallel/high-resolution-timing/184408835

Figure 1(a)

Figure 1(b)

SEP92: HIGH-RESOLUTION TIMING

HIGH-RESOLUTION TIMING

A fast, tight timer for PCs

This article contains the following executables: HRT.ARC

Thomas Roden

Thomas is a senior software engineer at Advanced Logic Research, 9401 Jeronimo, Irvine, CA 92718 or at rodentia @alr.com.

IBM-compatible PCs have always incorporated high-resolution timers (the Intel 8254 and equivalents) that govern short-duration functions like RAM refresh and speaker-tone generation, and longer-duration functions such as time and day determinations. Timers operate by counting pulses at a given frequency. With PCs, the timer input frequency is 1.19318 MHz--going into a 16-bit timer. Frequencies that are integer quotients (up to 1/65536) of the base frequency can be generated. This allows for a minimum frequency of approximately 18.2 Hz (corresponding to a period of about 55 milliseconds).

In PC parlance, the 55-millisecond period is often called a tick; much of the rest of the world, notably Macintosh programmers, use the term to refer to 1/60 second. Since the PC tick consists of 65,536 periods of the timer input (approximately 838 nanoseconds), it is natural to name this period. In the interest of cuteness, I'll refer to it as the "ticklet."

To allow longer durations to be timed, the output of one of the timers (Timer 0) is connected to an interrupt line (IRQ 0). The PC is programmed to respond to this interrupt by simulating a 21-bit timer cascaded from the 16-bit hardware timer. This 21-bit timer is large enough to hold time values up to 24 hours. Under DOS, its overflow is fed to the system date value, which should be good at least until the year 2000.

Time-of-day is normally needed only to the second resolution or slightly better, so Timer 0 is set to the minimum frequency. This allows the PC to spend as little time as possible simulating the additional timer bits, leaving more time for the user's program. Normal time-of-day functions read only the cascaded software timer (also known as the tick count), offering resolutions down to one tick.

Some applications require higher timer resolution, to which there are two popular approaches. For periods under one tick, the speaker timer (Timer 2) can be programmed to start and stop on cue (much like a stopwatch). This supplies resolution down to one ticklet. This ideal limit is degraded by the time it takes to access the timer gate and any other latencies in the control software. Three-ticklet resolution is generally attainable. The main side effect of this technique is that the speaker is unavailable during timing.

For periods over one tick, Timer 0 can be accelerated, causing the tick count to increase more quickly. Typically, a separate tick count is maintained, and the original IRQ 0 interrupt is occasionally simulated to preserve the integrity of the system tick count. The main side effect of this technique is increased system overhead. Increasing the IRQ 0 frequency to the point of supporting millisecond resolution seriously impacts system performance.

A technique outlined by Jerry Jongerius in his article "Accurately Timing Windows Events Without Timer Reprogramming" (Microsoft Systems Journal, July 1991) demonstrates that Timer-0 state can be read and combined with the normal system tick count to supply enhanced timer resolution without additional system overhead. Jongerius realizes 1-millisecond resolution with 100-microsecond resolution on faster machines.

A slightly different approach that streamlines Timer-0 reading and integrates 8259 Programmable Interrupt Controller (PIC) data allows resolution under 16 microseconds on most machines. Here's how it's done: The essence of high-resolution, nonintrusive timing is to combine the ticklet portion of the current time with a tick count. This is accomplished by reading Timer 0 to determine the number of ticklets that have passed since the last tick interrupt, using that value as the low word of the result, and using the low-order portions of the system tick count as the higher-order portions of the result.

Timer 0 is typically loaded with the terminal count of 0 (which corresponds to 65,536 in this instance) and set to mode 3 (square wave generator). In this mode, the timer is loaded with 0 and the output goes high. It then counts down, by twos, to two. The output goes low, the timer reloads with 0, and counts down again. The sequence repeats for each tick; see Figure 1(a) and Figure 1(b).

To convert the available timer-state information to a ticklet count, it is first necessary to translate the count down to a count up. The fact that a count of 0 is in fact the maximum value must also be handled. Since counting is by twos, this value is halved, producing a number from 0 to 32,767 which repeats once within the tick.

To resolve the ambiguity of a repeating count, the value of the timer output is polled to determine if this is the first or second count down. If the output is high, it is the first count down. This supplies the 15th bit to produce the ticklet count of 0 to 65,535.

This sounds like enough to produce the desired effect, but there are some caveats. Interrupts must be disabled while the timer is polled to avoid inconsistent ticklet and tick values. If the ticklet count was near maximum when interrupts were disabled, it could overflow before read and be combined with a tick count that is one too low (producing an error on the order of 65,536 ticklets!).

Fortunately, this condition can be detected. After the timer is read, the Programmable Interrupt Controller is interrogated to see if it has an IRQ 0 pending; see Figure 1(c). If so, the calculated ticklet value may correspond to a tick count of one greater than that in memory. It is not safe to simply add to the tick count when combining, because it is also possible to read the timer just before overflow and err in the other direction. The optimal solution is to assume that the ticklet count is about to overflow (has a value of 65,535) and combine it with the current tick count.

There is also a "feature" on some platforms that the timer output will in fact change early when the timer count is still two and not yet reloaded. This is not entirely compatible with the original Intel part, but it is out there, so it must be accommodated. The best approach is to reject the current values and read the hardware once more. It is almost impossible to read unusable values twice in a row, and would not delay longer than 55 milliseconds, at worst.

It is also possible to read the timer during the "null count" phase when the count is invalid due to reloading. Again, the best approach is to reject the values and reread the hardware. Doing so yields a ticklet count suitable for direct merging with a tick count; see Figure 1(d), Figure 1(e), and Figure 1(f). Even where there are inaccuracies, they are never glitchy: Time is always read as an increasing value, and discontinuities are kept to a minimum.

It's Only as Good as DOS

Unfortunately, the system tick count is not entirely stable. When the time reaches midnight, the count is cleared by DOS. Since it doesn't transition on a power of two, its error can propagate down to any timing interval. Thus, any routine using the system tick count must be prepared for such an error.

It is possible to add checking to determine if a day overflow has occurred, although this tends to add some overhead to every timer interrogation. There are other ways to handle such cases. These typically involve range-checking intervals and supplying defaults when they are out of range.

How to Do Better

The midnight-reset problem can be remedied by keeping a parallel tick count that rolls over on a power-of-two boundary. This can be accomplished by hooking an interrupt service routine (ISR) to IRQ 0. This does increase system overhead, but only slightly. Sixteen bits of count allow for a one-hour range. Thirty-two bits of count allow for a whopping seven-year range.

The ISR should increment its count before transferring to the original interrupt code. This ensures total consistency between the count and the ticklet state by not issuing an end of interrupt (EOI) until after the count is incremented. For the same reason, IRQ 0 (INT 08h) and not INT 1Ch should be the interrupt that is intercepted.

It may be safe to intercept at a different point (such as INT 1Ch) so long as it is not possible to call the ticklet reading code between the times that the IRQ 0 is acknowledged and the IRQ 0 ISR is complete. This would depend on the environment in which this technique is used.

How Fast do You Want to Go?

Combining the ticklet count with a parallel tick count can yield 48 bits of information spanning a seven-year period. The lower bits are of questionable accuracy, and the upper bits may correspond to uninteresting intervals. The number of bits to use in a count is the count's dynamic range. Where that window of bits is positioned is the resolution.

It is very inconvenient to deal with variables of greater than 32 bits in most environments. Even variables of greater than 16 bits are slower to process under DOS. This makes it useful to tailor the range and resolution of the count to one's application. Table 1 provides some examples.

Table 1: Tailoring the range and resolution of the count to one's application. (Numbers are approximate, so before you use this information in a programmable pacemaker, do your own timing analysis.)

  Minimum   Maximum   Variable  Shift
  Interval  Interval  Size
  ------------------------------------

    838 ns  1 hr      32 bits     0
   1676 ns  2 hrs     32 bits     1
   3352 ns  4 hrs     32 bits     2
   3352 ns  0.2 sec   16 bits     2
   13.4 us  0.88 sec  16 bits     4
   26.8 us  32 hrs    32 bits     5
    215 us  14 sec    16 bits     8

To avoid difficult situations, use maximum intervals of twice the durations expected. It is feasible to time intervals near the maximum, but they require sufficiently frequent polling to avoid undetected overflow.

Conditioning the Results

Timer output is an unsigned value. The general case of timing is of the interval from one event to another. At the first event, the timer is sampled and its value is stored. At the second output, the timer is again sampled, and this value is subtracted from the first. This yields a value that is correct even if there is a single timer roll over, so long as the range of the timer is not exceeded.

Timer overflow can be handled by monitoring that the timer in question has overflowed and setting a bit and/or ceasing to increment the timer. This requires additional IRQ-0 handling and is generally more difficult than simply using a wider counter.

The Right Tool for the Job

There are two major uses for a timer: to record the interval between two events, and to wait until such an interval has elapsed. These uses are analogous to a stopwatch and an alarm clock, respectively. Each task suggests different strategies for dealing with the inaccuracies inherent in DOS-dependent timing systems.

Interval recording is usually for the purpose of statistics gathering. Under these circumstances, it is usually acceptable to clip erroneous values to a "reasonable" range or drop them altogether. A value can be considered erroneous when it falls outside of an expected range.

Elapsed time checking is useful for real-time control systems (and games), where it should be guaranteed that a minimum amount of time passes between two events. The approach for this situation depends on which (hopefully infrequent) error behavior is more acceptable: delaying not long enough or delaying up to twice the requested duration.

Does Anybody Really Know What Time it Is?

Working in units of ticklet is all fine and good for our discussion, but it is certainly not a unit recognized by the National Institute of Standards and Technology. Doing all the work in a common unit would add complexity, overhead, and loss of precision to every step. A more effective approach is to supply routines to convert between ticklets and whatever fraction of second you use.

The conversion from ticklet to microsecond can be accomplished in integer by multiplying by 88/105. This introduces an error of roughly one part in seven million. Microseconds can be converted to ticklets with the inverse (105/88) with the same error. Ticklet/millisecond conversions can be accomplished with the fractions 11/13,125 and 13,125/11.

There are fractions that convert more accurately, but they handle only smaller values, due to intermediate overflow. The precise ticklet/millisecond fractions involved are based on 12,000,000/14,318,180 and its inverse.

Floating point will give better accuracy and less chance of overflow, but it is so slow as to risk affecting the timing. The selection of the suggested fractions was a bit more than pure guess work. It involved weighing the merits of accuracy, range, and compute time. (In fact, this topic is interesting to the point of deserving its own article.)

I Don"t Do Windows

The supplied implementation is designed for DOS, and may not work properly under other operating environments. In particular, if the Timer-0 frequency is changed, even by one, the assumptions made herein become invalid. One reason is that the behavior of a timer in mode 3 is different for even and odd initial counts. There would also be increased difficulty scaling the count for merging with a tick count.

Environments that virtualize I/O, such as Windows 386 Enhanced Mode, also have a problem. Not only is there a severe performance degradation, but there are also additional requirements to the environment. Operations that change the state of the processor interrupt-enable bit must be honored. There is also a relationship between the state of the timer and interrupt controller that must be simulated or preserved.

As Jongerius points out, Virtual Device Driver (VxD) programming techniques can alleviate these problems. Actually, this technique could be used to improve the performance of the Windows GetTickCount call if it were implemented by Microsoft.

Unfortunately, this issue doesn't affect Windows programs exclusively. If this technique is used in a DOS program running in a 386 Enhanced DOS box, it's values will be highly suspect. Discontinuities and glitchy results become common.

The Prize at the Bottom of the Box

HRTIME.ASM (Listing One, page 110) contains three externally referenced functions. hrt_open() is an initialization function. When built to use a parallel tick count, it installs the interrupt routine and clears the tick count. hrt_close() unhooks any interrupt routine installed by hrt_open().

hrtime() is the function that does the high-resolution time determination. It returns an unsigned long (assumed to be 32 bits) value of the number of ticklets either from the time hrt_open() was called (if built for parallel tick count) or since midnight.

I've also written a sample C program that uses the hrtime functions. Due to space considerations, this program and programmer's notes are available electronically. This program verifies that the technique I've described here works on your system. Please report any platforms on which the unmodified form of the technique fails. The program also illustrates how the major forms of timings are used and demonstrates conversion between ticklets and microseconds or milliseconds.



_HIGH-RESOLUTION TIMING_
by Thomas Roden

[LISTING ONE]



; hrtime.asm - Hi Resolution TIMEr for dos. Copyright (c) 1991 Thomas A. Roden
; All Rights Reserved (with one exception). The right to freely distribute this
; source and any executable code it creates is granted, provided that this
; copyright notice is included in the source. It is requested that the author's
; name (Thomas A. Roden) be included in the acknowledgements of any product
; including this code, but this request is in no way legally binding.

IO_DELAY_NEEDED equ 0   ; doesn't seem to need recovery time here
PRVT_TICKS_USED equ 1   ; if a private tick count is to be used
IRET_IF_RESTORE equ 0   ; if the flag is to be restored with an IRET,
                ; or with the windows suggested method
; io_delay - delay just a bit
io_delay    macro
if IO_DELAY_NEEDED
    jmp short $+2
endif   ; IO_DELAY_NEEDED
endm

PIC0_ADDR   equ 020h    ; I/O address of PIC 0
TIMER0      equ 040h    ; I/O address of Timer 0
TIMER_STAT  equ 043h    ; I/O address for status/control of timers 0-2

PIC_RIRR    equ 00Ah    ; Read Interrupt Request Register of a PIC
T0_R_S_C    equ 0C2h    ; Timer 0 Read Status and Count

DOS_GLOBALS equ 040h    ; segment for 40:xx variables
SYS_TIMER_CNT   equ 06Ch    ; offset for system count
 .model large
 .code
assume  ds:nothing, es:nothing

if PRVT_TICKS_USED
hrt_ticks   dd  0
old_int8    dd  0

; hrt_isr - interrupt routine for private tick counting for hi-res timer
    public  _hrt_isr
_hrt_isr    proc    far
    add word ptr cs:[hrt_ticks], 1
    adc word ptr cs:[hrt_ticks+2], 0

    jmp dword ptr cs:[old_int8]
_hrt_isr    endp
endif   ; PRVT_TICKS_USED

; hrt_open - the init function for the hi-res timer
    public  _hrt_open
_hrt_open   proc    far
    push    ds      ; save this stuff to avoid bad crashes
    push    es      ; save this stuff just to be paranoid
    push    bx
    push    cx
    push    dx
if PRVT_TICKS_USED
    xor ax, ax
    mov word ptr cs:[hrt_ticks], ax
    mov word ptr cs:[hrt_ticks+2], ax
    mov ax, 03508h  ; get int vector for int8 (irq0)
    int 21h
    mov word ptr cs:[old_int8], bx
    mov ax, es
    mov word ptr cs:[old_int8+2], ax
    mov dx, seg _hrt_isr
    mov ds, dx
    mov dx, offset _hrt_isr
    mov ax, 02508h  ; set int vector for int8 (irq0)
    int 21h
endif   ; PRVT_TICKS_USED
    pop dx
    pop cx
    pop bx
    pop es
    pop ds
    xor ax, ax
    ret
_hrt_open   endp
; hrt_close - the un-init function for the hi-res timer
    public  _hrt_close
_hrt_close  proc    far
    push    ds      ; save this stuff to avoid bad crashes
    push    es      ; save this stuff just to be paranoid
    push    bx
    push    cx
    push    dx
if PRVT_TICKS_USED
    mov dx, word ptr cs:[old_int8+2]
    mov ds, dx
    mov dx, word ptr cs:[old_int8]
    mov ax, 02508h  ; set int vector for int8 (irq0)
    int 21h
endif   ; PRVT_TICKS_USED
    pop dx
    pop cx
    pop bx
    pop es
    pop ds
    xor ax, ax
    ret
_hrt_close  endp
; hrtime - the hi-res time reader
    public  _hrtime
_hrtime proc    far
if IRET_IF_RESTORE
    pop bx      ; return ip - to make iret return frame
    pop ax      ; return cs
    pushf           ; flags
    push    ax      ; cs
    push    bx      ; ip
else    ; IRET_IF_RESTORE
    pushf           ; store flags on stack to check at end
endif   ; IRET_IF_RESTORE
    cli             ; freeze ticker while checking
hrt_readhw:
    mov al, T0_R_S_C        ; timer 0 read stat and count
    out TIMER_STAT, al
    io_delay
    in  al, TIMER0      ; store stat in ch
    mov ch, al
    io_delay
    in  al, TIMER0      ; store count in bx
    mov bl, al
    io_delay
    in  al, TIMER0
    mov bh, al
    io_delay            ; delay between timer and pic access
    mov al, PIC_RIRR
    out PIC0_ADDR, al
    io_delay
    in  al, PIC0_ADDR
    test    al, 001h        ; check if system time is stale
    jz  short hrt_timegood
    mov ax, 0ffffh      ; force max in lower part
    jmp short hrt_gotlo
hrt_timegood:
    test    ch, 040h
    jnz short hrt_readhw    ; timer invalid, retry
    mov ax, bx          ; move count to more convenient reg
    neg ax          ; convert to count up
    shl ch, 1           ; bash ch to get high bit of status
                    ; (output state) into high bit of
                    ; count via carry
    cmc             ; invert carry to match count negation
    rcr ax, 1           ; mode 3 counts down twice by twos,
                    ; first with output high, then low
hrt_gotlo:
; This would be the place to shift ax down if less resolution but greater range
; were needed. The merging with system ticks or parallel ticks would have to
; involve the same shifting.
if PRVT_TICKS_USED  ; if a parallel tick count is to be used
    mov dx, word ptr cs:[hrt_ticks]
else    ; PRVT_TICKS_USED
    mov dx, DOS_GLOBALS
    mov es, dx
    mov dx, es:[SYS_TIMER_CNT]  ; add system time
endif   ; PRVT_TICKS_USED
if IRET_IF_RESTORE
    iret
else    ; IRET_IF_RESTORE
    pop bx          ; get flags down to restore interrupt flag
    test    bx, 0200h       ; was IF set? (jump if no)
    jz  short hrt_if_done
    sti             ; restore interrupts to ON
hrt_if_done:
    ret
endif   ; IRET_IF_RESTORE
_hrtime endp

end