Cort Dougan is Chief Technical Officer at Finite State Research. He can be contacted at email@example.com.
Very few sites that use time synchronization systems have tested them. When asked how they know that the software/hardware is doing what is expected the response is usually "the documentation says so" or the "the software itself reports all is ok". There are very few tools available to easily test time synchronization or easily reproduced methods for testing and this is a likely cause for this problem.
Our tests show that NTP, one of the more popular time synchronization systems, exhibits some undesirable behavior including poor tracking of a single time source, significant offset from a desired time for long periods, and erratic "jumps" in time during startup. Running untested time synchronization software is a very dangerous thing since it exposes applications to unpredictable behavior of the system clock. This can be especially dangerous for sites that have external or customer requirements for time synchronization.
All of our tests were run with NTP version 4.2.4p6 on a RedHat Enterprise 5.3 stock installation with Supermicro 64-bit AMD quad-core servers. We used a Garmin GPS 16HVS as an external time reference. This GPS provides a pulse-per-second (PPS) signal. This is a 5v signal from the GPS that is produced every second-to-second transition with +/- 250 nanoseconds accuracy. The PPS line was run from the GPS to the serial port of the computers. This allowed the server and client machines to compute an offset from a common reference each second for an accurate measure of the offset between the clock on each system. This PPS signal was also used as a reference time for the server to synchronize to so that it was able to provide a time that was synchronized to other systems.
The main measurement I present in this article is the difference between the Linux clock (as provided to user programs) and GPS PPS signal on both the server system and client system. These values show how well the synchronization between the client and reference time source is being maintained through the server. To obtain this measurement we wrote a Linux kernel module that isolates one processor of the system and runs in a busy-wait loop looking for the PPS signal from the GPS. Once the signal is seen the Linux system time is noted through a gettimeofday() system call and the offset from the nearest second is calculated. A value of 0.0 seconds shows that the Linux time is exactly in sync with the external signal and the second transitions of the Linux clock are perfectly aligned with the GPS second transitions. If both the server and client show the same offset value then they are perfectly synchronized even though they may be offset from the GPS time. By routing the PPS signal from a single GPS to both the client and server we are able to measure the system time offset from a common reference on multiple systems at the same time.
There was no load placed on the systems used in these tests except where noted. The network was not loaded except where noted and each machine was connected to a Linksys SR2024 gigabit switch. Each test was started with the local system time set one half-second behind a reference time (a remote NTP server) to give each system a reproducible startup scenario and show how each deals with initial setup and starting error.