In today's environment of agile software development, it is critical to make sure that developers and test engineers have as many test environments as they need to produce the best quality software with the fast time-to-market. One important question is knowing how much dev and test infrastructure is necessary.
Given the variability in demand for test environments throughout the software development process, the answer is not straightforward. This article outlines a framework, based primarily on queuing theory, that will help estimate not only how many test resources are needed, but also the cost of not having enough. It then establishes the concept of a dev/test infrastructure "efficient frontier" and illustrates how to optimize the software development process in terms of the correct amount of capacity. It also explores the impact to this efficient frontier by the introduction of the public cloud as a viable development platform.
Variable Demand + Fixed Supply = Big Problems!
Let's start by taking a qualitative look at the problem. Anyone who has developed software is familiar with the fact that as you reach a project milestone or sprint, you need more test environments because there are more code commits coming in. After that point, the need for testing resources decreases, then picks up again as you reach your next milestone or sprint (Figure 1). This phenomenon is what leads to the variability of demand for test environments. However, enterprises typically have a fixed supply, meaning there are only a certain number of test environments available for a given development team. As a result, just when the team needs test environments the most, they often don't have enough. Developers and test engineers end up waiting for access to test environments, and sometimes if the wait is too long, they end up not testing enough, hoping that QA will find any issues in their code. This causes all kinds of problems later on in the cycle. Projects get delayed and code quality is not as good as it could be. Often, issues are found only in staging or production, when they are harder and more expensive to fix
Another issue that is related partly to the scarcity of test capacity infrastructure is that test environments often don't look like production environments (back in 2006, Martin Fowler coined "Test in a Clone of the Production Environment" as one of the principles of continuous integration). If there is limited capacity, developers and test engineering teams "cut corners," use mocks, stubs, and other simulation techniques to do the best job that they can given finite resources. This again manifests itself in the form of poor quality code, adds an additional burden on the QA team, and results in finding issues too late in the cycle.
The trade-off is shown in the graph in Figure 2. Too many dev/test environments, and the enterprise would have wasted capex (capital expenditure) and opex (operating expense) because the environments will be sitting around idle most of the time. Too few environments, and it will negatively impact the productivity of the development team and lead to developers waiting and not testing enough which has a direct cost impact. The key is to find the optimal point, or the lowest total cost to the organization.
Quantifying the Dev/Test Infrastructure Efficient Frontier
We can apply queuing theory to quantify this phenomenon. Think of code commits coming in at a certain rate (arrival rate = λ commits per hour) and test environments that can process the code commit at a certain rate (service rate = μ commits per hour). A code commit is routed to a test environment immediately if the environment is available (and so the developer doesn't have to wait), or the commit is stuck in a queue if all test environments are servicing other commits until one of them frees up (in which case, the developer has to wait).
With this framework, illustrated in Figure 3, we can calculate the average time a developer spends waiting and the average utilization of the test environments. Hence, we can compare the cost of the test environments versus the cost of developers waiting for a given number of test environments.
To illustrate this concept through a numerical example, let's make some assumptions (based in part on my observations of our own application development processes at Ravello, the company where I work).
General assumptions: Say you have a small team of 10 developers and test engineers who are developing additional features for your application, and your application is currently 300,000 lines of Java code. From your experience, the project will take approximately 12 months and will involve roughly 5,000 code commits (assuming the process involves a centralized source control with commits to mainline only). The application itself runs on 7 virtual machines: 2 Web servers, 2 application servers, 2 database servers, and a message queue. In addition, the production environment has a few network elements like load balancers, firewalls, and routers. As far as possible, you would like all testing (except unit testing) to be done on environments that are as close to the production environment as possible.
Development process assumptions: Currently, your organization follows a process where developers do unit testing on their local machines and then commit their code. This triggers a build, which is smoke tested on a scaled down version of the production environment (in order to save on cost). If it passes, it goes to the integration-testing phase that is done in a larger environment. If that passes, then it is queued up to be system tested nightly with other commits.
Everything is good on an "average" day: Under normal conditions, the development team has a commit rate of λ = 4 commits per hour. Integration tests take 12 minutes, which means that the service rate μ = 5 commits per hour. From our M-D-s queue simulation, we can derive the amount of time that developers would wait on average for a given number of test environments. The results of the simulation are shown in Figure 5.
Now we can simply translate developer waiting time into cost by assuming each developer is paid $150K per year. In addition, we can also calculate the marginal cost for each additional test environment. This allows us to plot our efficient frontier, illustrated in Figure 6.
As you can see, the optimal point is 2 smoke test environments, 2 integration test environments, 1 system testing environment, and 2 manual QA environments. Any more capacity, and we would have paid too much. Any less, and the cost of developers waiting would exceed the marginal cost of a new test environment.
Reality Bites: Version Day Mayhem
In reality, the assumption of 4 commits per hour is misleading (see the commit log in Figure 7, based on an actual Ravello log). As you can see in the sample commit log, the commit rate peaks to 20+ commits per hour close to release time (end of sprint); and often, it's even higher than that. At Ravello, our commit logs have shown a peak of 27 commits per hour.
As a result, in our conceptual framework, the "demand curve" shifts to the right. And, the optimal point shifts to the right, which means that we need more test environments (Figure 8).
If we had the earlier number of test environments for the average case, it would have led to developers waiting for access, substantially decreasing efficiency and throughput of our application development team. Developers wouldn't test enough because of the long wait, thus beginning the suboptimal process of just checking in code and hoping that QA will uncover any serious issues.