Scaling SOA with Distributed Computing

What Did They Do?

To address this scalability problem, our customer installed a compute grid—in this case, our Digipede Network—behind their services. A compute grid distributes computation across a flexible configuration of computing resources, matching appropriate work with these resources. While there are disagreements about the definition of "grid computing," there are nonetheless several factors fundamental to all flavors of compute grids:

  • Grids are inherently scalable. Resources can be added easily to increase performance.
  • Grids have dynamic capacity. They can be reconfigured to fit the current business need.
  • Grids are robust against computer failure and temporary disconnection.
  • Grids can incorporate both shared and dedicated resources, each configured to process appropriate workloads.
  • Grids provide CPU load balancing.

In addition, individual off-the-shelf grid solutions may provide other capabilities, such as system monitoring and control, distribution of applications and data, standardized management interfaces, and other features. Ideally, your grid solution is also a first-class service in its own right, with your other services composed on top of it.

Recalling the design goals for services in SOAs, you can see that implementing a grid infrastructure behind your service is consistent with the principles of autonomy, independence, and reuse.

System Architecture Before Grid

By the time the customer approached us about using grid technologies behind their SOA, they had already implemented a common pattern like that in Figure 1. Essentially, they had decoupled their systems by moving their analytics behind a web service. This first step had already enabled greater access to the analytics, but it didn't make the analytic service itself scalable. There was no inherent scalability, and typical network load-balancing strategies would not solve the problem of distributing computational load. Their problems were exacerbated because CPU-intensive tasks (both long- and short-running) were crippling the web server. Even the turnaround time for simple noncompute-intensive requests was greatly increased when the system was under load. Such unexpected decreases in quality of service plague many SOA efforts and, unless addressed quickly, can result in a loss of stakeholder confidence in the project.

Figure 1: Original web service.

The customer had already solved one of the thorny issues in SOA—dealing with long-running requests. This issue can be complex because many client applications (and technologies) are not services themselves and not directly addressable in the architecture. As a result, a full-duplex architecture, where the analytics service directly notifies the client application when the result is complete, is often not viable; see the sidebar "Request Duration" for more information.

In the case of our customer, it didn't make sense to implement a full-duplex architecture because of requirements on the client side. Instead, they adopted a more traditional request-and-response pattern—a composite of SOA and client-server practices. This separates the initiation of a service request from a request for the results. When the initial request is made in this case, the service returns a persistent token. In subsequent requests, the client uses this token to check the status or cancel the request. While there is some inefficiency in this approach, it does alleviate web server timeout problems. In addition, this approach supports both classic and AJAX-based browser applications.

