The SQL database has survived every paradigm shift critics said would be the death of SQL, including object-oriented programming (OOP), online analytical processing (OLAP), Internet computing and the World Wide Web. Some have suggested SQL platforms are not sufficiently scalable for large workloads or data volumes, but there's ample evidence to the contrary. The UPS shipping system central database processes 59 million transactions per hour. It has a table that contains more than 42 billion rows has achieved a peak workload of more than 1 billion SQL statements per hour with IBM DB2. The data warehouse at eBay, running on a Teradata system, contains 5 petabytes of data. LGR Telecommunications derives information from call records to feed a 310 TB Oracle data warehouse. At a recent conference, Microsoft reported Hotmail has 300 million users and processes more than 2 billion non-spam messages per day with Microsoft SQL Server running on a 10,000 server farm.
The SQL/XML database platforms provide a rich query model, supporting SQL, XQuery, XPath expressions, and SPARQL queries. Typically a key-value store requires logic in the application to perform record-oriented query processing. But instead of procedural programming, the SQL solution offers a declarative programming solution that relies on the query optimizer to generate the access path to the data. The SQL platforms offer mature administrative tools and standards-based connectivity, but the highest capacity SQL configurations have not yet been seen in the pay-per-use cloud.
IBM DB2 gives you a hybrid storage engine that supports transaction processing, business intelligence, and XML document processing. It currently holds several TPC benchmark records, including 1 million TPC-C transactions per minute on an 8-processor/64-core cluster running Red Hat Linux Advanced Server. But the ready-to-run Amazon EC2 AMIs aren't configured for that type of workload. The AMIs bundles are for running IBM DB2 Express Edition or Workgroup Edition and Informix Dynamic Server Express Edition and Workgroup Edition. For heavier lifting, you'll need to move your own DB2 Enterprise Edition or Informix Dynamic Server (IDS) licenses to EC2. Besides DB2 and Informix Dynamic Server, there are pre-packaged AMIs for IBM Lotus Web Content Management and WebSphere sMash. For DB2 or IDS development, IBM provides Developer AMIs for EC2 that have no DB2 or IDS usage charge.
Oracle users can transfer licenses to EC2 for Oracle 11g database, Fusion Middleware and Enterprise Manager. The company also provides ready-to-run AMIs and the Oracle Secure Backup Cloud Module can create compressed and encrypted database backups using Amazon S3. The S3 backups easily integrate with Oracle Recovery Manager using its SBT interface. The Oracle EC2 AMIs are pre-configured to use Enterprise Linux. The selection includes Oracle Database 10g Express Edition, Oracle Database 11g Enterprise Edition, Oracle Database 11g SE and WebLogic Server 10g. Oracle's licensing policy permits moving Fusion Middleware to EC2, including WebLogic Server, JRockit (Java VM), Coherence and the Tuxedo transaction processing monitor.
Oracle Coherence is an in-memory, distributed data grid that stores key-value pairs. It provides linear scalability (reportedly deployed in a 5000-node grid), replication, caching and transparent failover for distributed data. Coherence supports analysis and aggregation over the entire grid and it's available for C++, Java and .NET development. Oracle Real Application Clusters are not currently available on a public cloud provider
MySQL Enterprise is a platform suitable for cloud computing scenarios, such as scaling out with multiple servers and using master/slave replication. Some MySQL users have created a high-availability solution for Amazon EC2 by using a multi-instance master-master replication cluster. MySQL Enterprise subscribers can sign up for 24x7 support services for EC2, with different levels of support available from Sun. With the Platinum subscription, you get an enterprise dashboard, replication monitor, connectors, caching (memcached) and partitioning with MySQL Advanced. Continuent and Sun are working on making MySQL clustering technology available on cloud computing services such as GoGrid, Rackspace Mosso, and Amazon EC2.
EnterpriseDB Postgres Plus Cloud Edition is a version of PostgreSQL with enhancements such as GridSQL, replication, asynchronous pre-fetch for RAID, and a distributed memory cache. GridSQL uses a shared-nothing architecture to support parallel query processing for highly-scalable environments such as grids and clouds. For its cloud edition, EnterpriseDB partnered with Elastra, which had a SaaS offering with PostgreSQL and MySQL on Amazon and a product for management of clustered data warehouses. Elastra used Amazon S3 as the persistence solution to the ephemeral disk storage problem when trying to manage databases using EC2 instances.
In-Memory Databases, Cache
For applications that require extremely high throughput, in-memory databases and caches can be deployed to deliver performance. One solution is to pair an in-memory database with a disk-resident SQL database, with the former acting as a cache for the latter. TimesTen and solidDB are robust in-memory products that were acquired by Oracle and IBM respectively. Oracle TimesTen is an embeddable, in-memory database that supports ODBC and JDBC data access. It can provide real-time caching for and automatic synchronization with Oracle 11g databases. IBM solidDB maintains redundant copies of the database at all times and provides JDBC and ODBC query capabilities. It can scale using partitioning for instances, act as cache for SQL databases and do periodic snapshots (checkpoints to disk). The GigaSpaces XAP builds on the tuple spaces shared memory model and offers a JDBC capability for SQL queries.
To improve responsiveness, high-volume web sites typically use cache to reduce the number of queries against SQL databases. Ehcache is a distributed Java cache used by LinkedIn. The memcached server provides a distributed object cache often used for MySQL applications. JBoss Cache has been integrated with GridGain in the Open Cloud Platform. Grid Dynamics demonstrated linear scalability with the GridGain platform from 2 to 512 nodes using Amazon EC2 to run Monte Carlo simulations. Quetzall CloudCache is targeted at cloud applications hosted on EC2. CloudCache returns data in JSON or XML format and it can run in multiple EC2 regions. It offers a REST-style API and there are bindings for Ruby, Python, PHP, and Java development. Microsoft is currently previewing a distributed, in-memory cache code-named Velocity. It supports retrieving data by key or tag, optimistic and pessimistic concurrency, and it includes an ASP.NET session provider.