Dr. Dobb's | Scaling SOA with Distributed Computing

Scaling SOA with Distributed Computing

Scalability is critical when designing Service-Oriented Architectures.

October 05, 2006
URL:http://www.drdobbs.com/web-development/scaling-soa-with-distributed-computing/193104809

Robert is the CTO and cofounder of Digipede Technologies, and is an accomplished software architect and entrepreneur with over 15 years experience developing award-winning applications. You can read his blog at http://et.cairene.net and he can be contacted at [email protected].

Daniel is the Director of Products at Digipede Technologies, and has been developing enterprise software on Microsoft platforms since 1990. He blogs about distributed computing at http://westcoastgrid.blogspot.com and can be contacted at [email protected].

Enterprise architects are looking to service-oriented architectures (SOA) to solve a litany of real-world problems—brittle systems with tight interdependencies and language lock-in, data stuck in single-purpose silos, and applications that can't scale to meet growing demand, to name a few. The attraction of SOA is that it builds on the concepts of reusable software components, while emphasizing the service abstraction. This means that ideally, services in an SOA are interoperable, language agnostic, reusable, independent, stateless, autonomous, and publish a clear contract. To enable interoperability, services should be composable, loosely coupled, and standards compliant. These should be your design goals when designing SOAs.

SOA concepts stand in contrast to tightly coupled object-centric architectures such as Distributed-COM or CORBA. To create service components using these technologies, you build a component, define a distributed interface, and distribute the definition of that interface to all clients of that component. Clients of the service would also need to use that same technology. In contrast, SOA resolves many of the limitations inherent in earlier technologies by building on top of standards-based services.

Wise choices for implementation of an SOA can accomplish some of the design goals essentially for free. For example, by relying on standards-based web services as your basic building block, you get an interoperable, language-independent architecture. However, other design goals—scalability, for instance—are not so easy to come by.

SOA and Scalability

To create a scalable SOA requires an analysis of scalability requirements and issues at design time. In working with enterprise clients across different industries, we've found that these include:

Properly estimating usage patterns.
Managing user authentication/authorization.
Managing session state (where applicable).
Scaling customer or internal-facing web sites.
Scaling data resources.
Scaling CPU load.

While some of these issues might apply to the entire SOA, others apply only to specific services inside the greater architecture. Still, the result is the same: You can't build a scalable SOA on top of services that don't scale.

Of course, focusing on scalability needs to be traded off against other goals—budget constraints, transactions per second of a key system, user security, and the like. Unfortunately, we've found that in the planning stages, scalability tends to fare poorly against these other goals. Moreover, it becomes a greater problem during rollout.

By evaluating the requirements and capabilities of services, you better understand scalability hot spots. For example, if your services are stateless, autonomous, short running, and non-CPU intensive, then simple web server scalability options will suffice. However, for services with even moderate CPU intensity (like reformatting a PDF), then these options will come up short.

Time and again, customers have found that, as their services become more available to their enterprises, their usage patterns change dramatically. From the perspective of the SOA, a service implies simultaneous users and disparate usage (perhaps with peak usage and variable work sizes). But does your service know this? Or what if your service requires high availability to your greater enterprise yet also consumes significant CPU time?

Scaling Compute Load

One of our customers first approached Digipede (www.digipede.net) when designing their first implementation of an SOA. What led them to SOA was the need to expand their internal access to a critical, financial analytic application that performed both short- and long-running computations. They needed to provide many more people access to this service and expected demand to increase in the future. At the same time, they were under pressure to reduce the turnaround time for long-running analytics requests.

A critical issue for this company's service—and for services in many SOAs—is that it is highly compute-intensive. Typical analytics for a financial-services company include running scenarios based on years of historical data. As in many such simulations, better results can be achieved by running more scenarios, thus using more computing power. Clearly, this client needed to improve the performance and scalability of the service in a way that could scale to their future needs.

The classic approach to scaling up a service involves acquiring the fastest machine possible. However, this approach can be prohibitively expensive; the cost for a 32-way SMP box can easily top $1 million. Moreover, scaling up is ultimately a dead end. While you can scale to a certain extent, the size of the server limits the ultimate scalability. To avoid that expense and at the same time achieve the required scalability, many enterprises are turning to distributed computing and bringing together many lower cost computers to achieve the desired result.

Frequently, we find people insisting that network load balancing (NLB) is a solution to this problem. It is not. NLB balances network load, not CPU load. It directs traffic to servers performing the fewest transactions per interval of time. This is a poor measure of load for compute-intensive tasks: A server that is serving a long-running, CPU-intensive request completes a low number (or even zero) requests over the interval and therefore looks underutilized to NLB. What does NLB do in this case? It piles more work on that computer. A round-robin configuration of NLB doesn't solve the problem either.

What Did They Do?

To address this scalability problem, our customer installed a compute grid—in this case, our Digipede Network—behind their services. A compute grid distributes computation across a flexible configuration of computing resources, matching appropriate work with these resources. While there are disagreements about the definition of "grid computing," there are nonetheless several factors fundamental to all flavors of compute grids:

Grids are inherently scalable. Resources can be added easily to increase performance.
Grids have dynamic capacity. They can be reconfigured to fit the current business need.
Grids are robust against computer failure and temporary disconnection.
Grids can incorporate both shared and dedicated resources, each configured to process appropriate workloads.
Grids provide CPU load balancing.

In addition, individual off-the-shelf grid solutions may provide other capabilities, such as system monitoring and control, distribution of applications and data, standardized management interfaces, and other features. Ideally, your grid solution is also a first-class service in its own right, with your other services composed on top of it.

Recalling the design goals for services in SOAs, you can see that implementing a grid infrastructure behind your service is consistent with the principles of autonomy, independence, and reuse.

System Architecture Before Grid

By the time the customer approached us about using grid technologies behind their SOA, they had already implemented a common pattern like that in Figure 1. Essentially, they had decoupled their systems by moving their analytics behind a web service. This first step had already enabled greater access to the analytics, but it didn't make the analytic service itself scalable. There was no inherent scalability, and typical network load-balancing strategies would not solve the problem of distributing computational load. Their problems were exacerbated because CPU-intensive tasks (both long- and short-running) were crippling the web server. Even the turnaround time for simple noncompute-intensive requests was greatly increased when the system was under load. Such unexpected decreases in quality of service plague many SOA efforts and, unless addressed quickly, can result in a loss of stakeholder confidence in the project.

Figure 1: Original web service.

The customer had already solved one of the thorny issues in SOA—dealing with long-running requests. This issue can be complex because many client applications (and technologies) are not services themselves and not directly addressable in the architecture. As a result, a full-duplex architecture, where the analytics service directly notifies the client application when the result is complete, is often not viable; see the sidebar "Request Duration" for more information.

In the case of our customer, it didn't make sense to implement a full-duplex architecture because of requirements on the client side. Instead, they adopted a more traditional request-and-response pattern—a composite of SOA and client-server practices. This separates the initiation of a service request from a request for the results. When the initial request is made in this case, the service returns a persistent token. In subsequent requests, the client uses this token to check the status or cancel the request. While there is some inefficiency in this approach, it does alleviate web server timeout problems. In addition, this approach supports both classic and AJAX-based browser applications.

Life on the Grid

Our client addressed its SOA scalability issues by decomposing its web service application into two distinct parts. Figure 2 shows how the original web service has been modified to work with a grid-computing infrastructure. The web service and its computational code were split into separate modules. This front-end followed the same contract that had previously been in place. However, the computational work performed by the service was now offloaded to the grid.

Figure 2: Original web service modified to work with a grid-computing infrastructure.

The grid addresses the scalability issues beautifully. Each CPU-intensive analysis (both long- and short-running) takes advantage of an available CPU on the grid; the grid system monitors these tasks and ensures that results are returned in a timely manner. Because the grid is CPU load balancing, there aren't any issues with overloading individual machines. Increases in demand can be addressed by adding more hardware to the grid (and the Digipede Network automatically provisions the analytic applications to run on those machines).

Because the computationally intensive applications run elsewhere, the machine running the web service can handle much more traffic. If traffic there becomes an issue, that web service can be scaled using traditional web scaling methods (NLB, for instance).

How It Works

To adapt their analytics to run on a grid, the customer used the Digipede Framework SDK. They had already made the decision to use .NET for the analytic service, so they needed a solution that would work natively in that environment. Because the developers had already encapsulated their analytics in .NET classes for performing the calculations, the adaptation to a .NET-based system was quick. By encapsulating these classes in separate .NET assemblies, they were able to submit analytics requests for execution on the Digipede Network.

The service itself is built using ASP.NET and hosted in IIS 6, and manages session information in the system using a SQL Server database. When a new analytics request comes into the service, a token is generated and returned to the client. This token is the key to any analysis-specific information for the life of the computation.

Depending on the duration of the calculation necessary to fulfill the request and the parallelizability of the analysis, the service has two options—send the analysis to the grid as a whole, or decompose it into separate, parallel pieces and send all of those to the grid.

Again, depending on the possible level of parallelization, the service instantiates one or more objects to perform the analysis. These objects act as data holders for initiating the computation remotely. Collectively, these objects make up a job on the grid. The service submits this job to the Digipede Server and receives back a job-specific ID uniquely identifying this job on the grid. That ID is stored in the session database indexed by the token for later reference.

Meanwhile, each of the analysis objects is serialized automatically and distributed to an appropriate machine on the grid. The Digipede Agent software, running on each of those machines, determines if the machine can work on the job (based on its hardware, software, and availability). If the objects require any assemblies that are not already on the computer (for example, if a DLL has been updated recently), the Agent retrieves the necessary files before working on the job. When ready, it deserializes an object, executes the analysis, then serializes it again (this time, with results in tow) and returns them to the Digipede Server. The server itself monitors the status and health of each agent and reassigns any work that does not complete successfully.

Meanwhile, the client can query the web service to check the status of the job. To accomplish this, a client calls the analysis service using the same token received in response to its initial call. The service uses that token to lookup the job ID in the session database, then queries the Digipede Server for the status and results. The status of the job and each individual task can be returned, along with the result objects. In the event that the job is not complete, the system can provide an estimate of completion time.

Conclusion

As our customer realized, scalability is an important consideration in designing and building a service-oriented architecture. Fortunately, they grasped this at the right time—in the design phase. Trying to retrofit scalability onto an existing system can cost more in time and effort than the initial implementation. The risks are large: Exposing an SOA to your organization that will not be able to handle the load will endanger the success of the entire project.

To meet our customer's scalability needs, designing their SOA to run on a grid was the ideal solution. Their SOA has ample capacity and a simple plan for growth—they can now handle nearly 50 times the usage that the single server could provide, and adding commodity hardware can further expand capacity. By utilizing parallelization, they were also able to dramatically speed up their longer transactions (analysis that formerly took over an hour now completes in just a few minutes). Best of all, this grid will underpin all of their SOA efforts going forward: By deploying a common infrastructure, they will save significant development time and IT effort.

Request Duration

There are different scenarios where it makes sense to offload and distribute computes off of your web service. The first one that comes to most people's minds is when you have long-running tasks that peg your CPU. These kinds of applications are fairly common in the realm of intensive analytics or scientific computations. Even more common is a scenario where a short-running task consumes the CPU for a smaller period of time. For example, if your service generates dynamic content for a PDF, this might take only 15 seconds, but consume an entire CPU on a server.

While both of these scenarios are a good fit for distributed computing, the former may require a slightly different approach to session state. For example, a short-running task can be serviced with a simple request-response model and the service can be stateless. This is the simplest model for service requests and can look much like a method call to a client of your service. This approach can be highly scalable, depending on how you configure your web service and by taking advantage of its ability to reuse threads to efficiently manage concurrent connections.

Long-running tasks become more complex. You cannot assume that your client can maintain a consistent connection to your web service throughout the life of a task that takes 15 minutes, much less one hour or two days. In this case, you need to implement a solution that follows a full-duplex pattern (where your client is also a service and gets notified when the task is completed) or a polling scheme (where your client checks back later to get the results). Both of these solutions require stateful services. This full-duplex pattern becomes straightforward to implement using the Windows Communications Foundation (Indigo) included with .NET 3.0.