Ken North is an author, consultant and industry analyst. He teaches seminars and chaired the XML Devcon 200x conference series, Nextware, LinkedData Planet and DataServices World 200x conferences.
Cloud computing is the latest sea change affecting how we develop and deploy services and applications and fulfill the need for persistent information and database solutions. Database technology evolves even as new computing models emerge, inevitably raising questions about selecting the right database technology to match the new requirements.
The cloud is an elastic computing and data storage engine, a virtual network of servers, storage devices, and other computing resources. It's a major milestone in on-demand or utility computing, the evolutionary progeny of computer timesharing, high-performance networks and grid computing. The computer timesharing industry that emerged four decades ago pioneered the model for on-demand computing and pay-per-use resource sharing of storage and applications. More recently Ian Foster and Carl Kesselman advanced the concept of the grid to make large-scale computing networks accessible via a service model. Like computer timesharing and the grid, cloud computing often requires persistent storage so open source projects and commercial companies have responded with data store and database solutions.
- Public clouds include commercial enterprises that can host applications and databases, offering Software as a Service (Saas), Platform as a Service (PaaS), Infrastructure as a Service (IaaS) and Database as a Service (DaaS). Infrastructure providers include Amazon Elastic Compute Cloud (EC2), GoGrid, Rackspace Mosso, and Joyent, whereas Microsoft Azure, Google AppEngine, Force.com, Zoho, and Facebook are platform providers. There are also providers targeting specific classes of cloud users, such as HP CloudPrint and IBM LotusLive for collaboration services and social networking for businesses. Other SaaS providers include Birst and SAS for on-demand business intelligence (BI), Salesforce.com and Zoho for customer relationship management (CRM), Epicor, NetSuite, SAP Business ByDesign and Workday for enterprise resource planning (ERP) suites. The DaaS providers include EnterpriseDB, FathomDB, Longjump, and TrackVia.
- Private clouds, like server consolidation, clusters, and virtualization, are another evolutionary step in data center and grid technology. Gartner Research predicted government will have the largest private clouds but any organization having thousands of servers and massive storage requirements is a likely candidate. Security and reliability are the appeal of private clouds for large enterprises that can afford the infrastructure. Public cloud computing does not provide the 99.99% uptime that enterprise data center managers desire for service level agreements. The fact a private cloud sits behind a firewall mitigates the risk from exposing data to the cloud. The private cloud also alleviates concerns about data protection in multi-tenancy cloud environments. One issue in the private versus public cloud debate is the diversity of APIs used to invoke cloud services. This has caused interest in creating a standard but the Eucalyptus initiative took a different approach. Assuming the Amazon APIs to be a de facto standard, it developed private cloud software that's largely compatible with Amazon EC2 APIs.
When evaluating the suitability of a database solution for cloud computing, there are multiple considerations.
- First, you must consider the class of applications that will be served: business intelligence (BI), e-commerce transactions, knowledge bases, collaboration and so on.
- Second, you must determine suitability for public and/or private clouds.
- Thirdly, you must consider ease of development.
And, of course, budget is not to be overlooked.
Mission: What Will Run In the Cloud?
Selecting a database manager should be a function of the mission and applications it must support, not just budget and whether it will run in the enterprise or a private or public cloud.
Some organizations use a cloud provider as a backup for mission-critical applications or databases, not as the primary option for deploying applications or services. Oracle database users can run Backup software that uses Amazon Simplified Storage System (S3) for Oracle database backups. For an even bigger safety net, organizations can look to cloud computing as a disaster recovery option.
The New York Times project that created the TimesMachine, a web-accessible digital archive, is a prime example of a one-off cloud project requiring massively scalable computing. But factors besides on-demand elasticity come into play when the goal is hosting applications in the cloud on a long-term basis, particularly database applications.
Cloud users are often looking to deploy applications and databases with a highly-scalable, on-demand architecture, often on a pay-per-use basis. Common scenarios for using the cloud include startups and project-based, ad hoc efforts that want to ramp up quickly with minimal investment in infrastructure. But Amazon's public cloud has also been used to support e-business web sites, such as NetFlix.com, eHarmony.com and Target.com. E-mail is a backbone of modern business and companies, such as Boston Celtics, have gone to a cloud computing model for e-mail and collaboration software. Companies can also opt to use a cloud to host ERP or CRM suites that operate with SQL databases, such as open source ERP suites (Compiere, Openbravo, SugarCRM) and BI solutions (Jasper, Pentaho). Because data warehouses use source data from operational systems, organizations using the cloud to host operational databases are likely to do the same for data warehouses and business intelligence.
On a pay-as-you-go basis, the cloud handles provisioning on demand. Machine images, IP addresses, and disk arrays are not permanently assigned, but databases on a public cloud can be assigned to persistent storage. This saves having to bulk load a database each time you fire up machine instances and run your application. But it also puts a premium on database security and the cloud provider having a robust security model for multi-tenancy storage.
The cloud is particularly well-suited for processing large data sets and compute-intensive applications that benefit from parallel processing, such as rendering video and data analysis. The early Amazon EC2 users have included biomedical researchers, pharmaceutical, bioengineering and banking institutions. They were early adopters of grid computing for purposes such as financial modeling, drug discovery and other research. Medical research often requires massive simulations of genetic sequencing and molecular interactions. This has been done by grids, often using Basic Local Alignment Search Tool (BLAST) programs, and more recently by clouds. Researchers have also used MapReduce software in the cloud for genetic sequence analysis. Eli Lilly uses Amazon EC2 for processing bioinformatics sequence information.
Cloud computing is also used for other purposes, such as integrating SaaS and enterprise systems. Players in the on-demand integration space include Boomi, Cast Iron Systems, Hubspan, Informatica, Jitterbit and Pervasive Software. Business intelligence (BI) activity, such as analytics, data warehousing and data mining, requires horsepower and a capital outlay that might be prohibitive for small and medium businesses. Cloud computing offers an attractive pay-per-use alternative and there appears to be a large potential BI-on-demand market.
The marriage of cloud computing and business intelligence can be accomplished by several means. One option is to have data and applications hosted by a SaaS provider. Another is to create cloud-based applications hosted by an infrastructure provider. A third alternative is to do both and use data replication or a data integration suite.