Ken North is an author, consultant, and analyst. He chaired the XML DevCon 200x conference series, Nextware, LinkedData Planet, and DataServices World 200x.
The emergence of cloud computing raises a host of questions about the best database technology to use with this new model for on-demand computing. Ultimately, the cloud approach a company chooses determines the data management options that are available to it.
When evaluating the suitability of a database manager for cloud computing, there are three basic steps:
- Consider the class of applications that will be served: data asset protection, business intelligence, e-commerce, etc.
- Determine the suitability of these apps for public or private clouds.
- Factor in ease of development.
The database manager you choose should be a function of the mission and the applications it supports, and not based on budgets and whether it will run in the enterprise as a private cloud or as a public cloud from a service provider. For instance, some companies turn to a cloud provider to back up mission-critical databases or as a disaster recovery option. Database-intensive apps such as business intelligence can be deployed in the cloud by having a SaaS provider host the data and the app, an infrastructure provider host a cloud-based app, or a combination of these approaches. And popular solutions for processing very large data sets, such as Hadoop MapReduce, can run in both the private and public cloud.
Databases, data stores, and data access software should be evaluated for suitability for both public and private clouds. Public cloud security isn't adequate for some types of applications. For example, Amazon Dynamo was built to operate in a trusted environment, without authentication and authorization requirements. At a minimum, database communications and backups to the cloud need to be encrypted.
Security in cloud environments varies based on whether you use SaaS, a platform provider, or an infrastructure provider. SaaS providers bundle tools, APIs, and services, so you don't have to worry about choosing the optimal data store and security model. But if you create a private cloud or use an infrastructure provider, you'll have to select a data management tool that's consistent with your app's security needs. Your database decision also will hinge on whether the environment supports a multitenant or multi-instance model. Salesforce .com hosts apps on Oracle databases using multitenancy. Amazon EC2 supports multi-instance security. If you fire up an Amazon Machine Image running Oracle, DB2, or Microsoft SQL Server, you have a unique instance that doesn't serve other tenants. You have to authorize database users, define roles, and grant user privileges when using the infrastructure-as-a-service model.
For new applications hosted in the cloud, developers look primarily to classes of data stores such as SQL/XML databases, column data stores, distributed hash tables, and tuple spaces variants, such as in-memory databases, entity-attribute-value stores, and other non-SQL databases. Choosing the right data store depends on the scalability, load balancing, consistency, data integrity, transaction support, and security requirements. Some newer data stores have taken a minimalist approach, avoiding joins and not implementing schemas or strong typing; instead, they store data as strings or blobs. Scalability is important for very large data sets and has contributed to the recent enthusiasm for the distributed hash table and distributed key-value stores.
One interesting approach is the ability to configure fault-tolerant systems and hot backups for disaster recovery. A private cloud can be configured and operated with fairly seamless failover to Amazon EC2, for example. You'll have to rep- licate data in the private and public cloud, implementing the Amazon APIs and availability zones, as well as IP assignment and load balancing for the private cloud. You'll also have to use server configurations compatible with Amazon instances to avoid breaking applications and services because of changes in endianness, the Java heap size, and other dissimilarities.
In short, the cloud is an effective elastic computing and data storage engine, but matching the right platform with the right database is critical. Doing this correctly requires evaluating the job and its security needs, as well as assessing how easy it is to design and implement the software. Carefully weighing these factors will lead you to the right conclusion.