Many QA engineers I've known have had a love/hate relationship with beta testing. "It's expensive to administer and doesn't give me useful information," said one. "Too many bugs slip through this so-called 'beta testing,' so I have to test the whole app in the lab anyway," said another, "but if Marketing wants it, I can't stop 'em."
- Case Study: Gilt
- Red Hat cloud a road map to government cloud computing based on openness, portability, and choice
On the other hand, some organizations continue to see value in beta testing: "We need to plan for three weeks of beta phase to make sure we get better coverage on use cases and environments that we can't test against ourselves."
Used judiciously, beta testing programs can be valuable, but modern software development practices challenges the whole notion of beta testing. How do you fit beta testing cycles in already compressed release cycles or with frequent releases? And as user reviews and ratings of applications become more transparent through social and app store review channels, the definition of "app quality" is slowly morphing from functional correctness to user-perceived value. Doesn't that change the entire premise of a beta testing program?
There are many problems associated with traditional beta testing:
- Beta testing often generates too much noise (that is, feedback) that is not accurate and not actionable.
- Inconsistent participation too much or too little often administered with poor processes for collecting and analyzing feedback. Not all use cases get covered, so bugs slip through.
- Good catches but insufficient information: Even when bugs are identified, the reports are often not useful because they lack sufficient information to reproduce the defect.
- Delay: Beta testing slows the release cycle by having a dedicated phase before the production release.
In addition to these problems, several modern deployment practices are making beta testing less attractive.
Replacing Beta Testing
These modern deployment practices include everything from lean development, which favors small batch releases that eschew the phased model of development, to deployment methods that enable apps on mobile and desktop platforms to be updated automatically. In addition, the following trends are pressuring beta testing:
- Dogfooding: When staff at a company test their own software internally before releasing by using it day-to-day, whether for work or pleasure, it helps identify issues early without the embarrassment and brand damage of a faulty public release. When the developers themselves are the initial users, the user-feedback loop is immediate, resulting in software with better quality and utility. However, depending on the user profile for this "dogfooding," such programs can encounter similar problems to traditional beta programs users are often not professional testers, bug reporting may be inconsistent, and the testing does not cover all use cases (for example, new user registration flows, and the like).
- Staged roll-out: This is the most basic approach to modern software deployment in which code is tested and monitored for quality before broad release. It can take several different forms; for a website, a feature may be released to a small number of initial users, while activity is closely monitored. For a mobile app, an application may be initially released only to a small market to monitor quality and feedback. Sometimes the staged roll-out approach is a "beta program in disguise" variations on the actual execution can put this closer to a traditional beta program.
- Partial roll-out: This is similar to a staged roll-out: A large, clustered system deploys new code to a small fraction of servers. There is automated, active monitoring of those servers, and if anything goes wrong, the "immune system" detects the problem, and automatically rolls back offending changes.
- Testing in production (TiP): This practice — testing after a product is put into production is a controversial topic among QA professionals. It can be complementary to up-front testing or used as a means to shift the timing of quality testing from before to after deployment.
- Dark launch: Facebook popularized this approach with the launch of their chat service. Without revealing the feature in the UI, their Web application silently generated load to the chat server, simulating the load the service had to process, readying the infrastructure before the real launch.
Traditional beta testing continues to have a place for certain scenarios, such as when the cost of a buggy release and deploying a fix is very high. Beta programs are also useful when they can work as an early seeding program. (Gamers, for example, love being invited to betas.)
Beta Programs in the Modern Age
In the new world of continuous deployment and app stores, companies would do well to re-examine the focus and the goal of beta programs: moving the "functional testing in the wild" burden from only beta testing to including alternative options; using technology to augment the (beta) testers for collecting useful information; and incorporating a quality assurance mentality and associated procedures to areas other than functional correctness.
With the advent of crowdsourced testing, or what is often referred to as "expert-sourcing" because it often utilizes vetted and trained QA professionals, development organizations can now get the benefit of in-the-wild testing without the downside of beta testing's high noise level. This option offers companies the ability to test pre-deployment under real-world conditions and, in particular, address the difficult problem of mobile device fragmentation: OS versions, mobile carriers, memory and other mobile device configurations, or location diversity. Typically, the vendors will hire a test company's members in specific locales to beta test the software and report defects via agreed upon forms and channels.
Application instrumentation is a technique that only sophisticated dev shops implemented in the past. New tools including Crashlytics, Apphance, and others allow for crash reporting and user feedback directly from devices via simple instrumentation steps. By enabling testers to send screenshots and reproduction steps with each report and automatically collecting log and other environmental data accompanying bugs or crashes, these tools make the development team's job much easier by not having to decipher poorly written beta testing bug reports (such as "application crashed suddenly").
Finally, advanced analytics tools, such as Flurry and Applause, give managers implicit behavioral information and explicit feedback from users of apps in production to make real-time business decisions. Managers use these tools to go beyond app star ratings and drill down on categorized attributes of individual user reviews. As a result, companies can analyze their app's performance and user sentiment to easily recognize issues that require action.
By combining these techniques, development organizations now get useful information about their products in development or in production, and respond intelligently based on real feedback without relying too heavily on traditional beta programs.
So, is beta testing dead? The answer is yes for some organizations, but not for everyone. For companies that want to move fast to remain relevant and keep customers loyal, these new practices help reduce the release cycle by reducing reliance on long beta periods.
Fumi Matsumoto is a beta testing expert and the CTO of uTest.