Channels ▼
RSS

Design

eBay Controls the Chaos, Why Can't We?


Jonathan is CEO of Replay Solutions. He can be contacted at jonathan@replaysolutions.com.


It occurs to me that as web applications have taken over the world, we still have a heck of a lot to learn about managing them. Complexity in the software industry has skyrocketed, and applications that once were versioned and released every six months are now being upgraded with new production code every couple of weeks. eBay is a prime example of this model. It is widely known that online auctioneer eBay has become so adept and addressing issues and deploying changes to their servers, that every two weeks, a whole new version of eBay is up and running.

Now, eBay is not a simple application, and downtime at eBay can cost millions of dollars, not to mention generate an angry mob of users who are not only desperately trying to buy the latest iPhone 3G, but some of whom are also trying to make a living. This is serious stuff. Downtime at eBay is front-page news.

It's redundant to point out the dangers of our new Software-as-a-Service (SaaS) software paradigm. However, the one-to-many relationship between application server and end-users does present pitfalls that did not exist in Bill Gates's world. When Microsoft Outlook crashes, you may be upset, say something not very nice, then restart it. When your application servers crash, several thousand of your users may all collectively say something not very nice about you.

So why aren't more companies able to follow the eBay model? What does eBay know that they don't?

I've spent time thinking about this, and the answer may lie not just in how they deploy new changes, but how they resolve issues when they occur. Is this truly one of the great remaining challenges in the realm of software? Some would say so, and they have good reasons to back that up.

Multitier applications represent some of the greatest levels of complexity ever seen in the software industry. With pieces of your application running on many heterogeneous, physically dispersed servers and environments, understanding what went wrong in these environments can be next to impossible. When issues occur, most often the only hope a team has is to attempt to reproduce the same conditions that caused the error, and hope it happens again. This means that to understand the root-cause of issues, recreating the environment, repopulating the database, and generating the required load on the servers is the only solution. Frequently, the pain of going through this effort is too great, and the issues lie dormant...until the next time something bad happens!

What the software industry has been screaming out for is the ability to quickly capture, reproduce, and isolate issues as they occur. What we need is something like "TiVo for Software."

When I thought about starting a company around the concept of recording and replaying software execution, I did not initially think about all the mechanisms replay technology could eventually replace. In 2004, we started Replay Solutions with a technology to record not only an application's execution, but just as importantly, the complex environment in which the application ran. With this ability, teams can dispense with massive amounts of inefficient workflows that have traditionally been manual, iterative, and error prone.

Imagine this scenario: Your newly outsourced team in India is handling QA for your complex, multitier application. They're doing a great job and have found over 100 issues with your application. You've got the problem reports, log files, and the very large database datasets that your application was using when the bad things happened. Next comes the fun part. Now it's your turn to bring up the same environment that your outsourced team was running. I hope you're using virtual servers! Finally, let's take a shot at generating the same load on the application that existed when the problem occurred. Hopefully, the moons have aligned, and your fingers are crossed...

Now let's fast-forward. Your outsourced team in India is using your recording system. You arrive in the morning, log on to your defect tracking system, load the recording of an issue they found, and press 'play.' This time, every event that affected your application in that complex environment, including output from your authentication, LDAP, caching, and e-commerce servers, has all been recorded and stored. Even the database and its dataset are no longer required. Most importantly, the end-user traffic that ultimately triggered the problem has been recorded as well. All of these elements are perfectly reproduced, allowing you to focus on the most important thing: What went wrong.

Anyone who has been involved in software development can relate to the age-old conundrum of trying to reproduce an issue that simply doesn't appear to exist—at least, not on your machine. Too many sleepless nights have been wasted chasing down phantom bugs. It's time for the madness to stop. The problems we're facing are only getting more complex as new technologies are brought to market. This new software paradigm is here to stay. Luckily, I believe new technologies such as record and replay will help control the chaos.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video