Dr. Dobb's | Real Users Really Matter

To deal with the complexity of web applications, there has to be a link between development and operations.

To deploy web applications and manage their performance requires new tools and new approaches. Because of the complexity of web applications and infrastructure, performance has to be "baked in" before being launched into production. But complexity diminishes a developer's ability to fully test and characterize the performance of new applications running on production infrastructures. The result is that nearly half of all application outages are uncovered by end users. With web applications' growing importance to the bottom line, the need to assure web application performance at all levels becomes a critical challenge.

The solution to this problem is to implement an early detection and rapid response process during development that can also be used in production. Actionable information collected using this process can then identify areas for performance optimization, infrastructure tuning, and quick problem resolution.

It's All About the Real User

Web applications are no longer developed one line of HTML at a time. They are typically orchestrated using existing internal or third-party web services with unknown performance characteristics and often invoked across the web cloud. To further complicate the task of managing performance, the dynamic nature of this new generation of web applications results in different users traversing the application infrastructure through divergent paths, making it difficult (if not impossible) to recreate and diagnose performance problems.

However, users of web applications are demanding a richer experience that requires streaming multimedia content and fat client-like capabilities. Web 2.0 applications are no longer a sequence of static HTML pages. They place a heavy dependency on the capability of the end user's computer to constantly re-render the page or host a Flash or Silverlight player, and demand robust last-mile connectivity to handle chatty XML calls.

How can you deliver feature-rich applications with a level of performance that captivates today's hyper-impatient end users? The question is crucial in an environment where a few seconds can mean the difference between satisfied and former users.

Traditional monitoring tools and techniques focus on monitoring the performance of servers. This is because of the strong correlation between server performance and positive end-user experience. But the complexity of web applications has changed all that. A functioning server can no longer ensure that "real" users are experiencing acceptable application performance from across the Web. A hundred different things can (and will) go wrong between the real user's browser and the content or data that the user is accessing. As a result, the key to managing web application performance is to accurately measure performance as experienced by real users at the browser where web applications come together. Real user experience is fundamentally the only true measure of web application performance.

Measuring the real user's experience of web application performance is useful from a reporting perspective; however, the data is not actionable. According to Forrester analyst Jean Pierre Garbani, web application performance has to be monitored and managed at the granularity of each individual transaction, not from a silo (cloud, logical server, code, database, and the like) or infrastructure component (PC, routers, servers, Internet/WAN/LAN, and so on). Poor performing transactions from the real user's perspective should be traced from browser to database—including third-party web service calls—to produce a map of the transaction path through the infrastructure as well as the time consumed by each infrastructure tier.

The purpose of mapping the transaction path of real transactions initiated by real users is to facilitate the identification of causes or performance problems or bottlenecks. Each real transaction might follow a different path through a complex (and potentially virtualized) infrastructure, which makes it difficult and time consuming for correlating data stored in configuration management databases to pinpoint problems. With a browser-to-database mapping of transactions from the real user's perspective and the time consumed by each of the infrastructure tiers clearly measured, developers can easily identify the cause of any performance degradation. In most organizations, the process incurred to triage, re-create, and diagnose a problem usually consumes most of the time needed to resolve a problem. With this new approach, the labor cost and time to problem resolution of the real user's performance problem can be greatly reduced.

Building a Glass Wall

With the growing complexity of web applications, having a process for continued performance improvement and problem avoidance is critical. Yes, performance issues occur and are sometimes unavoidable due to situations beyond the control of developers or operations personnel. Again, the key is to bake in performance through a culture of cooperation where developers and operations work together so that performance problems can either be resolved proactively, or detected and resolved quickly before they impact user satisfaction.

The Information Technology Infrastructure Library Framework (ITIL; www.itil-officialsite.com) segregates the software development lifecycle into six phases: requirements, design, build, deploy, operate, and optimize (Figure 1). The last two phases of the lifecycle are particularly important when dealing with complex web applications. On the one hand, web applications are never truly "completed" because complexity, lack of control over third-party infrastructure service providers, and the ever increasing time-to-market pressure for web applications prevents an exhaustive testing of all possible use scenarios.

To deal with the complexity of web applications, there has to be a linkage between the traditionally discrete development and operations functions. In other words, development and operations have to have a common view of business impact, real user performance, application infrastructure performance, incidents, and problems. Ideally there is a single tool and set of metadata that can bridge these two functional groups and offer developers opportunities for optimization, and operations personnel ability to efficiently identify and remedy problems.

For complex web applications, developers are constantly called on to deal with production problems, whether to patch code-level problems that impact performance, or infrastructural problems that require workarounds. And because of their knowledge of the application, they are also called on to serve on triage teams attempting to recreate or diagnose potential or real performance problems. In fact, Gartner Group reports that nearly 40 percent of a developer's time is consumed by production problems. This activity has a tremendous impact on the development schedule and developer productivity.

The automated and continuous monitoring and diagnosis of transactional problems from a real user perspective performs three important functions in a production setting:

To facilitate the cooperation between development and operations in matters of web application performance, there has to be a common platform for the sharing of performance information (metadata) that's relevant to both teams and a defined process for acting on the information. In a way, this is akin to replacing the traditional "Chinese Wall" that separates development from operations with a "glass wall." A metaphorical wall, or predefined and enforceable set of business policies, is important so that developers cannot arbitrarily modify released code or the underlying database or infrastructure running the code without following proper release, change, and configuration management protocols. Instead of being opaque, this "wall" should be a transparent, so there is informational exchange between the two functional groups.

Both development and operations utilize different tools. Data collected or generated by development tools has no meaning to system management or DBA tools used by operations personnel and vice versa. This Tower of Babel situation, if not remedied through the use of a common tool and metadata, makes the implementation of the ITIL process impractical.

Step-by-Step, Putting It Together

Developers need a tool that measures the right service-level metrics and that can help them discover potential performance issues from a real user's perspective, with granularity down to individual transactions. By monitoring and tracing both synthetic and user-generated transactions while testing an application beta release, developers can proactively resolve issues that might impact production ramp up, including abnormalities with a low probability of occurrence but severe consequences. The actionable transaction performance information can also be utilized to tune the application and infrastructure so as to minimize the variability in response time over the entire spectrum of users. Finally, with the application in production, ITIL calls for continued optimization. A common tool that provides both the extensive transactional information required by developers and the scalability and low overhead of an operations tool is needed to facilitate cooperation and metadata transparency.

Table 1 summarizes the needs and requirements for cross-functional tools that can bridge the deployment phase of the ITIL application lifecycle with the operation and optimization phases, as well as the functional needs of developers during deployment, and the on-going 24/7 operations monitoring and problem resolution needs of operations personnel.

Conclusion

Applying an approach to monitor and diagnose performance problems from a real user's perspective during the deployment and operation phases of the application lifecycle can assure the performance of web application, and reduce the time to problem resolution in production. This approach offers data that can bridge the gap between development and IT operations, baking in performance preproduction, and offering an effective communication between development and operations to facilitate the resolution of performance issues to restore and optimize application performance.

Needs	Functional Requirements
Direct measurement of end-user response time, especially for composite web applications	Nonintrusive instrumentation of end-user browser for direct end-to-end performance measurement
Discover all real transactional paths through infrastructure and application	Trace all real transactions from browser to back-end to the specificity of inside/outside firewall, tier, server, web service, and method/query level
Identify infrastructure and application bottlenecks	Record and report time taken by each constituent element (whether infrastructure or application) to process transaction
Common platform for developers and operations personnel	Record end-to-end transaction performance and tracing information, yet scalable with low-overhead for production use
Efficient workflow facilitating cooperation between functional groups: business, development, and operations	Case-based workflow centered around business groups supported by extensive business-impact reporting

Table 1: Functional requirements