Dr. Dobb's | Stateful Web Applications that Scale Like Stateless Ones

Stateful Web Applications that Scale Like Stateless Ones

Beware of the State Monster

June 11, 2008
URL:http://www.drdobbs.com/tools/stateful-web-applications-that-scale-lik/208403462

Orion Letizi is a co-founder and software engineer of Terracotta.

Within every innocent web application lies a sleeping monster. If successful, your application will outgrow its single-machine architecture. The adult web application must expand to live on more than one application server to handle its request load. That's when the latent beast strikes -- the State Monster.

The State Monster represents all of the user-specific state data that your application requires to carry on a web conversation with your users, as well as application state data like data caches, indexes, and the like. In a single-machine architecture, this data lives comfortably as objects on your JVM's heap. In a scaled-out architecture, however, it eats you alive.

The Stateless Convention

The conventional wisdom on scaling web applications is to push all of this state out of the application tier to somewhere else. This "stateless" architecture is supposed to scale by letting you add application servers as user load increases. It's also supposed to be highly available and easy to operate because each individual application server can be brought in and out of service without losing data and interrupting the user experience.

The problem is, where does all that state go?

The Stateless Convention: Push State Down to the Database

The most obvious place to put state data is the database. Most applications already have a database available and it seems like a natural place to externalize data. Unfortunately, it's probably the worst place to put it because it overloads and creates a bottleneck out of one of your most precious resources.

The entire point of the three-tier architecture is to create isolation between the application tier and the data tier. Putting application state into the database binds the availability and scalability characteristics of the application tier to that of the database tier.

Worse, relational databases are not designed to store state data. Application state data is naturally shaped like objects. There is an impedance mismatch between the way databases store data and the way applications store data. A great deal of developer effort and server processing power goes into marshalling and unmarshalling application state data back and forth across this divide.

Stateless architectures backed by a relational database are hard to develop, hard to manage, and expensive to operate. As many have seen, if you go down this path, the State Monster will eat your database.

The Stateless Convention: Push State Up to the Client

A second option is to push state up to the client. If conversational state fits in cookies on the client user-agent, this is a fault-tolerant, highly scalable way of managing user-specific data. However, this approach is really only good for small sessions and doesn't help at all for general application data like your caches, queues, and indexes.

Besides being only a partial solution, it also suffers from the aforementioned impedance mismatch. You still have to explicitly ferry your user state across the network between your application servers and the user-agent. Doing so is considerably more work than just leaving state on the heap.

The Stateless Convention: Push State to Peers

If neither the database nor the client is a safe haven for application state, a third option is to push it out horizontally to other members of the application tier. This approach has the advantage of keeping application state data isolated in the proper tier, but presents a number of critical problems.

First, naively pushing state to every other member of the application tier all the time doesn't scale and bottlenecks on the network. Besides, you almost never need all data everywhere all the time.

To get around the problems of broadcasting all state data to all peers, sometimes the web tier is striped so that chunks of application data can be homed on a particular set of cluster members. The simplest incarnation of this approach is the "buddy system" where every application node has a designated buddy that it copies its state data to. If any single application server goes off-line, its buddy has a working copy of the relevant state and can seamlessly take over the new application load with no loss of data.

More sophisticated implementations of cluster striping can more generically partition data across a cluster. This lets a cluster stripe contain more than just a buddy pair and can be responsible for more than just user-specific data. While cluster striping can alleviate the network bottleneck, it forces your application servers to double as clustered data managers. Because these very different types of work are commingled, they can't be scaled independently of each other, nor can the different quality of service requirements of the application and the clustering service be managed independently. This defeats the entire purpose of a tiered architecture.

And, again, the application server-as-state-server architecture suffers from the same impedance mismatch that we've seen twice already. The object data must somehow be externalized from its home on the JVM heap and ferried to its home-away-from-home in another application server's heap. This commotion causes irreparable damage to the object model and the programming model because the objects that represent state data can no longer be trusted to act like regular Java objects.

If you go down this path of commingling your application services with your cluster services, the State Monster will eat your entire application tier.

The Real Problem: Stateless Applications Eat Developer Brains

The elephant in the room in all stateless architectures is that there is no such thing as "stateless." Pushing state out of the application server doesn't get rid of that state, it just forces you to manage it somewhere else. And, it does so at the expense of your most precious commodity: your poor developer brain. It's hard to write stateless applications. Even if it doesn't seem hard to you, the plain fact is that while you are busy dealing with the effects of a stateless architecture, you could instead be implementing new features and improving existing ones.

The reason externalizing data is hard is because all of the work of managing object state that the JVM used to do for you is now your explicit responsibility. You must now tell the clustering engine that you've made a change so that it may commit your changes to the cluster. And, before reading an object's state, you must remember to check it out of the clustering service.

Managing state data by hand is hard. That's why the JVM goes to so much trouble to do it for you.

Enter Terracotta: Code Like Your Mom Used to Make

What if you didn't have to do any of this funny business to get scalability and reliability? What if the JVM had access to a service that you could plug into to make its heap durable, arbitrarily large, and shared with every other JVM in your application tier? Enter Terracotta, network-attached, durable virtual heap for the JVM. In the spirit of full-disclosure, I'm a co-founder of Terracotta and work there as a software developer.

Terracotta is an infrastructure service that is deployed as a stand-alone server plus a library that plugs into your existing JVMs and transparently clusters your JVM's heap. Terracotta makes some of your JVM heap shared via a network connection to the Terracotta server so that a bunch of JVMs can all access the shared heap as if it were local heap. You can think of it like a network-attached filesystem, but for your object data; see Figure 1.

[Click image to view at full size]

Figure 1

The Terracotta server handles keeping your clustered object data persistent and coherent and coordinating threads across attached JVMs. The Terracotta client libraries use standard bytecode manipulation techniques to make your existing code cluster aware. All of this is done transparently, driven by a declarative configuration file or through annotations, not through an intrusive API.

Terracotta manages the clustered virtual heap the same way the JVM manages the physical heap. Field updates to clustered objects are automatically and efficiently distributed by Terracotta to the cluster so that all participating JVMs have an up-to-date and stable view of your object data. Terracotta also manages thread interaction between JVMs by making your synchronized(), wait(), and notify() calls cluster aware.

Your code thinks it's talking to threads and objects in the local JVM. Terracotta lets you deploy that same code on as many JVMs as you need to meet your capacity demands without having to write a "stateless" application.

A Terracotta Case Study

To see how this works in the real world, let's take a look at how a large publishing company recently used Terracotta to solve a critical database overload problem in their stateless web application.

The application in question is a test proctoring service that holds examinations for up to 5000 concurrent users. Before using Terracotta, the application was deployed using a stateless architecture backed by a relational database. As users proceeded through the examination, their answers were committed to the database so no in-flight examinations would be in jeopardy should one of the application servers go off-line.

At that level of traffic, the database was already at 70 percent utilization. However, the business was growing and the forecast of double the number of concurrent users in a matter of months presented the application team with a choice between scaling up the overloaded database at a significant cost or investigating some sort of alternative scale-out architecture.

The application team settled on using Terracotta to store the in-flight examination data in the durable virtual heap. They were able to keep state data -- the incomplete examination -- as plain Java objects until the examination was over at which time the results were posted all at once to the database. By eliminating the chatter with the database during the examination, they were able to double the number of concurrent users and simultaneously reduce their database utilization from a hot 70 percent to a much cooler 30 percent.

Figure 2

Terracotta let them use a natural object model where examinations were represented as simple objects, rather than suffering the impedance mismatch of saving and loading them to and from the database on every request. Yet, because those objects were durable and available to any application server in the cluster, they retained all of the virtues of a stateless architecture with none of its vices.

With their Terracotta-enabled architecture, any application server can be decommissioned at a moment's notice and all of its traffic diverted to other active members of the web cluster without any loss of data or interruption of a user's workflow through the examination. More application servers can be deployed as needed to keep up with demand. And, not only is this scale achieved without additional load on the database, the database load has actually decreased because they no longer use it to hold state data.

Conclusion

Stateless architectures are hard to build and hard to scale. Many an unsuspecting application has been taken down the stateless route, only to find that the added stress on other parts of the application infrastructure confound scalability goals.

Luckily there are tools like like those described here that can restore sanity to beleaguered scale-out applications and their developers. With a shared, durable heap, you can write applications with the simplicity of a stateful, single-machine deployment; but, when it comes time to scale, you can deploy on purpose-built scaling infrastructure, unburdening the database and the rest of your infrastructure from the demands of high-availability and increasing capacity.