100 GB to 30,000 GB: Size, Speed, and Benchmarks
Anyone who's spent much time around software designers and developers knows they often compare the merits of various database systems. Those discussions usually focus on differentiators such as performance and ease of programming. This was true as long ago as the famous database debate between two ACM Turing Award winners, Charles Bachman and Edgar F. Codd.
Performance, data integrity, and ease of use are key characteristics to look for when comparing database systems. (In this context, ease of use is often a function of the tools available for modeling, design, and programming.)
When comparing database systems, we often make subjective judgments. At the same time, we yearn for an objective yardstick for measuring a database management system (DBMS) and database applications that use it.
Many in the database community have a strong interest in instrumentation and measurement of database performance. For developers and DBMS providers, performance is a characteristic we want to measure and quantify in terms that permit comparisons.
System architects are worth their weight in gold when they can identify and eliminate performance penalties. Numerous factors can hinder performance, including less than optimal hardware, shoddy database design, network latency, inferior database drivers, poor application coding, and queries that require optimization.
Benchmarks have proven to be a valuable tool for studying performance under different database workloads. They can provide information that's useful for identifying and isolating problems. They can be used in combination with regression testing as a quality assurance tool for new software releases. But the most high-profile use of database benchmarks is providing a quantitative comparison of differences between hardware and software used for database work.
The computing industry has funded the Transaction Processing Council (TPC) to produce foremost benchmarks for database performance measurement, but there are others.
The Open Source Database Benchmark (OSDB) is a community-maintained benchmark that's modeled after AS3AP, the ANSI SQL Scalable and Portable Benchmark. There's been very little recent activity in the OSDB community since the 2010 announcement of version 0.90. PolePosition is a benchmark for comparing a combination of database engines with object-relational mapping technology.
Yahoo! Research created a benchmark that's well known in the NoSQL community. The Yahoo Cloud Serving Benchmark provides common workloads for comparing key-value stores and other cloud databases, such as Cassandra, HBase, and Sherpa. This benchmark is not designed for testing complex query operations, but it provides useful metrics for CRUD-type operations (read and write latency).
The gold standard of database benchmarks is the collection developed by the TPC. Computer manufacturers frequently cite the TPC results for publication when they've established a new price-performance record. The people who develop the TPC benchmarks include representatives of member organizations, such as Cisco, Dell, Fujitsu, HP, IBM, Intel, Microsoft, Oracle, Sybase, and VMware.