Channels ▼

Ken North

Dr. Dobb's Bloggers

Performance and Data Access Part 1: Time, Transactions, Packets

November 01, 2009

Performance, reliability and security continue to challenge system architects, as they have throughout the era of distributed computing. For systems with a database and network infrastructure, the performance, reliability and security challenges today are substantial. Whether applications are a composite of components, classes, assemblies, libraries, scripts, services or all of the above, an old adage still applies:

A chain is only as strong as its weakest link.

Experience has taught us that dealing with the weakest link in a system architecture is often an ongoing process of remediation, for reliability, security and performance problems. In web services, enterprise applications, cloud computing and other distributed processing; the weak link can produce performance bottlenecks that we often quantify using time.

For some computing technologies, a relationship to time can be a defining characteristic. Time-division multiplexing provides  a solution for processing multiple data streams to handle communication with multiple devices. Some operating systems have implemented multitasking with the processor scheduling task execution based of time slices. Computer timesharing was a catalyst for solutions to monitor program execution time and account for processor and storage usage.

In my salad days, a NASA project on which I worked was a prime example of real-time software that required 24x7 operations. The Goddard Real-Time System handled a telemetry stream from spacecraft that supported monitoring of biomedical data. The real-time moniker applies to software that processes data so rapidly it permits decision-making as events unfold, whether it's monitoring spacecraft, aircraft or stock prices or handling live video streams. Today complex event processing (CEP) software and in-memory databases are geared to meeting the real-time requirement.

Having become accustomed to using time as a measure of performance, we easily
understand that Usain Bolt's performance on the track is extraordinary because
of a pattern of world-record times. Like track, metrics produced by performance
benchmarks of computers and software are often time-centric. For hard drives, we
use benchmarks to compute average read transfer performance, with throughput
expressed as megabytes per second. We evaluate CPU chips by comparing execution times using benchmarks that include a mix of tasks. For benchmarks of SQL database query processing, we use a mix of SQL queries to measure average execution times.

With execution time being a traditional measure of computing performance, we
often use it in specifications and contracts to define requirements. Since keeping users engaged is important for interactive software, specifications often define a response time goal; this requirement is typically expressed in seconds. At the OS level, context switches and interrupt service routines must operate in sub-second execution times. For database-enabled applications and services, response time often depends on query execution time. For online transaction processing (OLTP), there is often a system requirement expressed as transactions per minute or transactions per second.

The anticipated response time for database applications and services varies with
different workloads. Approving a credit card purchase is an example of online
transaction processing (OLTP) involving short-running transactions that should
execute in seconds. OLTP applications often use a timestamp column in a table to indicate the time of a transaction. Other types of temporal data that might be stored in databases include interval, valid time, transaction-start time and transaction-end time. TSQL2 provides temporal extensions to SQL that support not only transaction time, but transaction-from and  transaction-to times.

Not all database activity is a short-running transaction. Bulk loading of databases and ETL operations for a data warehouse can take hours. Analytical databases and data warehouses typically have a different type of workload than transaction processing databases. Behind the scenes of a business intelligence dashboard there can be a complex decision-support analysis that generates long-running queries. Besides a mix of workloads and transactions, factors such as virtualization and network latency can affect execution time and ultimately, distributed processing performance.

SQL queries were an early class of computing problem amenable to distributed
processing using a client-server model. A client could generate a query and ship
it across the network wire for a specialized database server to process and return results. Network communications, such as application layer protocols operating over network interface, Internet and transport protocols, are integral to client-server SQL processing. Today much SQL processing is done over standards-based Ethernet and TCP/IP networks, but the middleware and database wire protocols that drive communication between client and server vary by platform. Network communication using TCP/IP is a process that includes breaking data into packets for transmission across the Internet and re-assembling the packets at the other end of the network wire.

With Ethernet and TCP/IP, local or wide-area networks, executing SQL queries can involve multiple network round-trips and experience latency. When SQL database products evolved to support client-server processing, DBMS companies such as Sybase, Oracle and IBM recognized the potential for network latency. They invented databases that embed logic, such as stored procedures and classes, thereby putting code closer to the data on which it operates. But there's still the question of application servers and database clients distributed across multiple computers, so the network remains an important factor in determining query processing performance.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video