First Implementation and Some Mistakes
The first implementation of the test harness was crude because of time and resource constraints. The original machine that ran these tests was built from spare parts and had no case. (The name "Sparky" comes from our attempts to power-on the board without an actual on-off switch.)
Still, the value of the system was proven within a few days. We found compile errors we didn't know existed because we never compiled with those options enabled. We quickly added the ability to do runtime tests and expanded to what I've described here. We needed a simple and easy way to report the results of the run. We experimented with several methods, including e-mail reports. However, e-mail was inconvenient, got mixed up with spam, and wasn't workable. We ended up using a web-page system so that we could easily access this information and catalog it in a database.
Testing Is Important
We were careful not to force all testing to be automated. Testing can be dull, methodical, and boring, but we did not want to lose the idea that testing is a job that everyone does. A serious concern with a system like this is that people become auditors of the system, rather than driving the tests. We avoided this and improved our testing by being carefuland we actually write more tests now that we know they will get run regularly and attended to, instead of rarely run and never maintained.
In effect, we "outsourced" our boring and monotonous testing to a set of machines, closing the loop between writing, testing, and delivering. We installed a policy early on in this effort that every new feature or bug fix includes a test in the nightly system. We have not had a recurrence of a bug or "false fix" since then.
The important part of this was to not be draconian in our enforcement. We had a good culture of testing and wanted to improve it. Forcing people to do extra work without a good reason would have likely destroyed that. Instead, what people began to see is that they were able to move on to other projects more quickly when they did not have to return to an old bug and fix it over and over again.
Programmers also like to know that once they write something, they can later blame someone else for breaking it. With proof, that's much easier. This led to the natural evolution of a development cycle that is now codified in our development process:
- Write documentation.
- Write a test.
- Write an example.
- Write the code.
- Make sure the example and test work properly.
We were also able to replicate this test system in other places. Our office in India was able to recreate a smaller test setup within a few days with few instructions. One of the unforeseen advantages of this system is that remote employees could use what we had hereall it takes is a remote login. When we need a given developer (in another office) to test some software on hardware that they do not have (an embedded board of a certain type, for example), it would have required us to purchase another board or send the actual board to them. Either way, it would have taken days to get this done for perhaps a few hours of work. Now they can log into our system remotely, use the console, reset the board, and work on it directly.
Having a history of our tests helped us in tracking trends. We were able to gather and display months of history easily. It was important to see performance trends in our software. Was our performance increasing or decreasing? It was, in fact, useful in determining whether our testing was getting better: It actually tested the test system itself!
This also served as a useful tool in testing newly implemented engineering processes. Did they help, hurt, or do nothing? Software test systems aren't only required to test softwarethey can test everything related that ends up as software.
We use this on a regular basis for performance predictions, too. How much of a performance improvement can we expect in the first three months of supporting a new platform? How long does it take for a new feature or newly supported platform to stabilize and become reliable?