Five years ago at FSMLabs, our real-time operating-system software had grown to support so many features and platforms that it was beyond the scope of our small technical staff to fully test and still develop the system at a reasonable pace. The dilemma we faced was that on one hand, users of real-time software expect and require immense amounts of testing and validation before seeing the software. On the other hand, we wanted to maintain our relatively small technical staff of highly skilled and focused developers. Our size let us put customers directly in contact with the people who write the code, and that was something we didn't want to change. Consequently, we couldn't assign a large group of test engineers to the daily execution of a test system, nor could we employ any of the available large and inflexible test systems.
We searched for a system that would fit our requirements, but found nothing of use. The general Linux/BSD test community is primitive, relying for the most part on "many users who act as beta testers." The Linux Test Project (LTP) doesn't include tests for the latest kernel, offering superficial and inconsistent tests. The best you can find at the Open Source Development Labs (OSDL) is a listing of "uptimes" for platforms from the OSDL test site. And while there are testing tools and harnesses for general use, they're not designed for operating system work. With this in mind, we built a test infrastructure that did not require great effort to develop or maintain.
What the System Does
In its current form, our test system does a build of all software components and tools, boots each computer type we support, and runs a set of measurement, specification, and regression tests. It then reports all of this in a web page, packaging the compiled and assembled pieces on a remote server where we can master CDs, provide automated downloads, or access copies ourselves.
First of all, we needed our build system to do what most test systems dobuild the system and make sure no one had broken the build in the past 24 hours. Linux-based builds are run on the build "master" server"Sparky"which is a Linux host. NetBSD, FreeBSD, and OpenBSD builds are started on remote hosts dedicated specifically for those builds. This lets us run many builds in parallel, but report all the results and data on a single host (Sparky). As the system grew, we could add compile capacity to it with new hosts.
This process starts with a complete build of our toolchainbinutils, GCC, glibc, and all the target filesystem utilities (used for embedded systems and NFS root filesystem). Everything is built from scratch to ensure that the entire build processwhich is all implemented in shell scriptsworks. Because we target x86, x86 64-bit, PowerPC 6xx (including altivec support), PowerPC 4xx, PowerPC 8xx, PowerPC e500, FRV, MIPS, and ARM/XScale, we end up building an entire toolchain and root filesystem for a number of architectures.
The system then builds the Linux/BSD kernels (we have about 25 different source trees that need to be built), RTCore (the real-time kernel), and all real-time drivers and add-ons to RTCore (real-time networking, memory protection, interfaces, and so on). While the builds run, they show "in progress" on the web page. Once complete, they display success/failure and are color-coded appropriately: blue in progress, red failed, and green success.
When the entire build test for a given architecture is complete, the Linux, NetBSD, or FreeBSD kernel is copied to the tftp boot file location on our server, the root filesystem for the tests is copied to the NFS server, and the build for the next architecture/configuration is started. While the next build continues, the current tests commence.
Every test starts by booting a given board that has been put on our test rack. We centralize this information in the board.sh shell script, which lets each platform be referred to by a specific board name or platform"Opteron 64-bit SMP" or just "host114." Board.sh is called by the test system to start logging the serial console output and reset power on the board. Once this stage is complete, we know the compiler tools have produced a working and booting Linux/BSD kernel, NFS root works, and compiled healthy user utilities.
The test system then runs some standard tests that all platforms must passabout 350 in all. They test for the presence of previously fixed bugs, ensure the system conforms to our software specification documents, and test the general health and functioning of the system. The tests run a total of 15 iterations. The next set of standard tests is measurement tests (see devnet.developerpipeline.com/documents/s=9854/q=1/cuj0404dougan/0404dougan.htm) that verify that performance is within expected tolerances for each platform and ensure that we have a consistent source of benchmarks. We often use those to verify that feature creep has not affected system performancewhich lets us study long-term trends in our system months or years later.
Because not every test can be run on every platform and sometimes the tests require multiple hosts, we encode that information in the shell script itself; for example, if the platform ppc_gemini is the only one that can run this test, yet two hosts are required. Figure 1 illustrates the results during a build with some failures, some successes, and some in-progress.