Channels ▼
RSS

Testing

The Death of Beta Testing?


Many QA engineers I've known have had a love/hate relationship with beta testing. "It's expensive to administer and doesn't give me useful information," said one. "Too many bugs slip through this so-called 'beta testing,' so I have to test the whole app in the lab anyway," said another, "but if Marketing wants it, I can't stop 'em."

More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

On the other hand, some organizations continue to see value in beta testing: "We need to plan for three weeks of beta phase to make sure we get better coverage on use cases and environments that we can't test against ourselves."

Used judiciously, beta testing programs can be valuable, but modern software development practices challenges the whole notion of beta testing. How do you fit beta testing cycles in already compressed release cycles or with frequent releases? And as user reviews and ratings of applications become more transparent through social and app store review channels, the definition of "app quality" is slowly morphing from functional correctness to user-perceived value. Doesn't that change the entire premise of a beta testing program? 

There are many problems associated with traditional beta testing:

  • Beta testing often generates too much noise (that is, feedback) that is not accurate and not actionable.
  • Inconsistent participation — too much or too little — often administered with poor processes for collecting and analyzing feedback. Not all use cases get covered, so bugs slip through. 
  • Good catches but insufficient information: Even when bugs are identified, the reports are often not useful because they lack sufficient information to reproduce the defect.
  • Delay: Beta testing slows the release cycle by having a dedicated phase before the production release.

In addition to these problems, several modern deployment practices are making beta testing less attractive.

Replacing Beta Testing

These modern deployment practices include everything from lean development, which favors small batch releases that eschew the phased model of development, to deployment methods that enable apps on mobile and desktop platforms to be updated automatically. In addition, the following trends are pressuring beta testing:

  • Dogfooding: When staff at a company test their own software internally before releasing by using it day-to-day, whether for work or pleasure, it helps identify issues early without the embarrassment and brand damage of a faulty public release. When the developers themselves are the initial users, the user-feedback loop is immediate, resulting in software with better quality and utility. However, depending on the user profile for this "dogfooding," such programs can encounter similar problems to traditional beta programs — users are often not professional testers, bug reporting may be inconsistent, and the testing does not cover all use cases (for example, new user registration flows, and the like).
  • Staged roll-out: This is the most basic approach to modern software deployment in which code is tested and monitored for quality before broad release. It can take several different forms; for a website, a feature may be released to a small number of initial users, while activity is closely monitored. For a mobile app, an application may be initially released only to a small market to monitor quality and feedback. Sometimes the staged roll-out approach is a "beta program in disguise" — variations on the actual execution can put this closer to a traditional beta program.
  • Partial roll-out: This is similar to a staged roll-out: A large, clustered system deploys new code to a small fraction of servers. There is automated, active monitoring of those servers, and if anything goes wrong, the "immune system" detects the problem, and automatically rolls back offending changes.
  • Testing in production (TiP): This practice — testing after a product is put into production — is a controversial topic among QA professionals. It can be complementary to up-front testing or used as a means to shift the timing of quality testing from before to after deployment.
  • Dark launch: Facebook popularized this approach with the launch of their chat service. Without revealing the feature in the UI, their Web application silently generated load to the chat server, simulating the load the service had to process, readying the infrastructure before the real launch.

Traditional beta testing continues to have a place for certain scenarios, such as when the cost of a buggy release and deploying a fix is very high. Beta programs are also useful when they can work as an early seeding program. (Gamers, for example, love being invited to betas.)

Beta Programs in the Modern Age

In the new world of continuous deployment and app stores, companies would do well to re-examine the focus and the goal of beta programs: moving the "functional testing in the wild" burden from only beta testing to including alternative options; using technology to augment the (beta) testers for collecting useful information; and incorporating a quality assurance mentality and associated procedures to areas other than functional correctness.

With the advent of crowdsourced testing, or what is often referred to as "expert-sourcing" because it often utilizes vetted and trained QA professionals, development organizations can now get the benefit of in-the-wild testing without the downside of beta testing's high noise level. This option offers companies the ability to test pre-deployment under real-world conditions and, in particular, address the difficult problem of mobile device fragmentation: OS versions, mobile carriers, memory and other mobile device configurations, or location diversity. Typically, the vendors will hire a test company's members in specific locales to beta test the software and report defects via agreed upon forms and channels.

Application instrumentation is a technique that only sophisticated dev shops implemented in the past. New tools including Crashlytics, Apphance, and others allow for crash reporting and user feedback directly from devices via simple instrumentation steps. By enabling testers to send screenshots and reproduction steps with each report and automatically collecting log and other environmental data accompanying bugs or crashes, these tools make the development team's job much easier by not having to decipher poorly written beta testing bug reports (such as "application crashed suddenly").

Finally, advanced analytics tools, such as Flurry and Applause, give managers implicit behavioral information and explicit feedback from users of apps in production to make real-time business decisions. Managers use these tools to go beyond app star ratings and drill down on categorized attributes of individual user reviews. As a result, companies can analyze their app's performance and user sentiment to easily recognize issues that require action. 

By combining these techniques, development organizations now get useful information about their products in development or in production, and respond intelligently based on real feedback without relying too heavily on traditional beta programs.

So, is beta testing dead? The answer is yes for some organizations, but not for everyone. For companies that want to move fast to remain relevant and keep customers loyal, these new practices help reduce the release cycle by reducing reliance on long beta periods.


Fumi Matsumoto is a beta testing expert and the CTO of uTest.


Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

AndrewBinstock
2013-03-26T01:00:12

"Short release cycles require short and efficient testing, not the absence of testing, and adding or removing the word "beta" to it does not change anything...I've never heard of testing that is not staged in some way, so that the effects of the first-found bugs are minimized."

I believe the author is specifically referring to beta test cycles in which the pre-release product is sent to lots of customers for them to look at and exercise. There is no real way to do those kinds of beta tests in the short release cycle model. That and the other factors mentioned by the authors have dramatically changed beta testing.


Permalink
ubm_techweb_disqus_sso_-9c052414ac9da9eb68a0f7ba526e1d76
2013-03-21T17:41:10

The testing problems described as "dead" here are not new. They sound just like a development group that has gone static and are not really paying attention to what will improve the testing process. In my experience, all good testing involves staged rollouts, and always have. After all "beta" comes after alpha, and with alpha testing, first we test internally (alpha), then externally (beta). "Dogfooding" just means that you integrate the developers into the testing process -- if you are not doing that as part of your "alpha" or pre-alpha "developer" testing (at least in my domain of application software), then forget about, give up your job. Hasn't it been long known that developers need to be integrated into the analysis, design, usability, and testing parts of development, to get the best results (what the Agile folks keep on telling us)? In my experience, yes. But of course, in my experience, newbies get into software every day and don't yet know what most of us experienced folks know. Short release cycles require short and efficient testing, not the absence of testing, and adding or removing the word "beta" to it does not change anything...I've never heard of testing that is not staged in some way, so that the effects of the first-found bugs are minimized. An effective Beta Test by any other name smells just as sweet (did I just butcher Shakespeare?). So go ahead and kill your old slow ineffective test method but don't think that you can't get away with not testing, no matter what you call it.


Permalink
dblake950
2013-03-21T14:27:08

"The definition of app quality is slowly morphing from functional correctness to user-perceived value " hits the nail on the head. Organizations have moved to methods like dogfooding in large part due to social media and ubiquitous connectivity/commenting. We now live in a world where bad word of mouth, regardless of whether it's valid, can kill a project, so why take the risk?


Permalink

Video