Channels ▼
RSS

Web Development

Stateful Web Applications that Scale Like Stateless Ones


Orion Letizi is a co-founder and software engineer of Terracotta.


Within every innocent web application lies a sleeping monster. If successful, your application will outgrow its single-machine architecture. The adult web application must expand to live on more than one application server to handle its request load. That's when the latent beast strikes -- the State Monster.

The State Monster represents all of the user-specific state data that your application requires to carry on a web conversation with your users, as well as application state data like data caches, indexes, and the like. In a single-machine architecture, this data lives comfortably as objects on your JVM's heap. In a scaled-out architecture, however, it eats you alive.

The Stateless Convention

The conventional wisdom on scaling web applications is to push all of this state out of the application tier to somewhere else. This "stateless" architecture is supposed to scale by letting you add application servers as user load increases. It's also supposed to be highly available and easy to operate because each individual application server can be brought in and out of service without losing data and interrupting the user experience.

The problem is, where does all that state go?

The Stateless Convention: Push State Down to the Database

The most obvious place to put state data is the database. Most applications already have a database available and it seems like a natural place to externalize data. Unfortunately, it's probably the worst place to put it because it overloads and creates a bottleneck out of one of your most precious resources.

The entire point of the three-tier architecture is to create isolation between the application tier and the data tier. Putting application state into the database binds the availability and scalability characteristics of the application tier to that of the database tier.

Worse, relational databases are not designed to store state data. Application state data is naturally shaped like objects. There is an impedance mismatch between the way databases store data and the way applications store data. A great deal of developer effort and server processing power goes into marshalling and unmarshalling application state data back and forth across this divide.

Stateless architectures backed by a relational database are hard to develop, hard to manage, and expensive to operate. As many have seen, if you go down this path, the State Monster will eat your database.

The Stateless Convention: Push State Up to the Client

A second option is to push state up to the client. If conversational state fits in cookies on the client user-agent, this is a fault-tolerant, highly scalable way of managing user-specific data. However, this approach is really only good for small sessions and doesn't help at all for general application data like your caches, queues, and indexes.

Besides being only a partial solution, it also suffers from the aforementioned impedance mismatch. You still have to explicitly ferry your user state across the network between your application servers and the user-agent. Doing so is considerably more work than just leaving state on the heap.

The Stateless Convention: Push State to Peers

If neither the database nor the client is a safe haven for application state, a third option is to push it out horizontally to other members of the application tier. This approach has the advantage of keeping application state data isolated in the proper tier, but presents a number of critical problems.

First, naively pushing state to every other member of the application tier all the time doesn't scale and bottlenecks on the network. Besides, you almost never need all data everywhere all the time.

To get around the problems of broadcasting all state data to all peers, sometimes the web tier is striped so that chunks of application data can be homed on a particular set of cluster members. The simplest incarnation of this approach is the "buddy system" where every application node has a designated buddy that it copies its state data to. If any single application server goes off-line, its buddy has a working copy of the relevant state and can seamlessly take over the new application load with no loss of data.

More sophisticated implementations of cluster striping can more generically partition data across a cluster. This lets a cluster stripe contain more than just a buddy pair and can be responsible for more than just user-specific data. While cluster striping can alleviate the network bottleneck, it forces your application servers to double as clustered data managers. Because these very different types of work are commingled, they can't be scaled independently of each other, nor can the different quality of service requirements of the application and the clustering service be managed independently. This defeats the entire purpose of a tiered architecture.

And, again, the application server-as-state-server architecture suffers from the same impedance mismatch that we've seen twice already. The object data must somehow be externalized from its home on the JVM heap and ferried to its home-away-from-home in another application server's heap. This commotion causes irreparable damage to the object model and the programming model because the objects that represent state data can no longer be trusted to act like regular Java objects.

If you go down this path of commingling your application services with your cluster services, the State Monster will eat your entire application tier.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 
Dr. Dobb's TV