Channels ▼

Nick Plante

Dr. Dobb's Bloggers

Running Experiments

July 18, 2010

One of the great things about the web as a delivery mechanism is that it's so quick and easy to roll out new messaging and features. This means that it's also possible to track user satisfaction and get feedback on those changes quickly, and then to iterate on them for constant improvement. It makes experimentation easy.

Lately, there's been a lot of discussion about using A/B or split testing strategies for choosing marketing copy, images, signaling, and placement. Pioneers in this space of course include names like Amazon, EBay, and Google. If you're developing a web-based application there's a chance you might want to look into it too.

With a good A/B testing strategy you can send website traffic to different page variants, tracking goals accomplished by submitting a form or clicking a link, and from that, analyze the data and draw a logical conclusion about whether the bright green button or the subtle blue button is more likely to appeal to your audience. This may sound simple, but if the difference results in an 8% increase in user conversion, it can have a dramatic effect. Similar experiments are run with simple marketing copy, with visual layouts, with contact forms ("how much data should we require?"), and so on.

You've probably noticed that A/B testing features are increasingly found in online marketing services. Google, for example, allows you to perform selective audience testing with AdWords, where different visitors see different pages. MailChimp allows you to run A/B tests on your mailing list campaigns. If you want to run your own tests, there are easy to use hosted services like Performable and Optimizely that are easy to get started with. Easy enough, in fact, for marketing folks with no technical skills to grasp when testing landing pages.

But, as developers, we may be more interested in something at the library level that we can integrate directly into our apps. If that's the case there are a number of homegrown ways to structure effective split testing as well as some very nice open source "experiment frameworks" like Vanity and A/Bingo that you can integrate right into your own code.

This is data geek heaven, in a lot of ways. Not only do these framework come with nice visual result interfaces, but you can easily define your own metrics and place your own conversion tracking anywhere you need it. If you like, you can grab the raw data from the experiments, and manipulate it as you see fit. Having more control, and having something like this tightly integrated into your application, can be quite useful. Vanity, for example, allows you to manage identity as part of the testing process, which means you can always show random experiments to everyone, or you can consistently show the same variants to the same people, or the same class of users.

A/B testing gurus tend to stress making tests as simple as possible in order to be sure of the outcome (for more complex test scenarios, read up on Multivariate testing). But what about testing actual website features? Selective feature deployment has also gotten some good attention lately. A few months back, Ross Harmes over at Flickr wrote about their use of Flags and Flippers to rollout / rollback new features when testing new features. Forrst uses a similar technique that they refer to as buckets. Rails developer Alan deLevie followed this up by writing a post about how to run these sorts of feature rollout experiments yourself in Rails.

Rick Olson of GitHub has also weighed in on the subject, discussing how they rely on Redis to support feature rollout, thus reducing the amount of small source code configuration tweaks that need to be made. And now James Golick has published a new project called rollout that takes this same philosophy and packages it into an easily reusable library, while also supporting selective feature rollout by user. The upside of this is clear: you can roll out features to a select number of users, find bugs, and track their usage of the changes (no matter how small) before subjecting them to a larger audience.

This sort of selective feature testing is not entirely dissimilar to what A/B testing accomplishes, but at a different level. Of course, the A/B testing purists would note that the results in this case are harder to analyze and draw any solid conclusions from, simply because the experiments are far more complex. It'll be interesting to continue to watch the advances in this space. The easier it is to track user satisfaction with a website change -- whether it's a new marketing callout, a new workflow, or a small new feature -- the easier it becomes to build a quality service that people actually want to use. And that's what we're all trying to do as web developers, after all.One of the great things about the web as a delivery mechanism is that it's so quick and easy to roll out new messaging and features. This means that it's also possible to track user satisfaction and get feedback on those changes quickly, and then to iterate on them for constant improvement. It makes experimentation easy.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Dr. Dobb's TV