Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


3 Steps To Managing Data In The Cloud

Ken North is an author, consultant, and analyst. He chaired the XML DevCon 200x conference series, Nextware, LinkedData Planet, and DataServices World 200x.

The emergence of cloud computing raises a host of questions about the best database technology to use with this new model for on-demand computing. Ultimately, the cloud approach a company chooses determines the data management options that are available to it.

When evaluating the suitability of a database manager for cloud computing, there are three basic steps:

  • Consider the class of applications that will be served: data asset protection, business intelligence, e-commerce, etc.
  • Determine the suitability of these apps for public or private clouds.
  • Factor in ease of development.

The database manager you choose should be a function of the mission and the applications it supports, and not based on budgets and whether it will run in the enterprise as a private cloud or as a public cloud from a service provider. For instance, some companies turn to a cloud provider to back up mission-critical databases or as a disaster recovery option. Database-intensive apps such as business intelligence can be deployed in the cloud by having a SaaS provider host the data and the app, an infrastructure provider host a cloud-based app, or a combination of these approaches. And popular solutions for processing very large data sets, such as Hadoop MapReduce, can run in both the private and public cloud.

Databases, data stores, and data access software should be evaluated for suitability for both public and private clouds. Public cloud security isn't adequate for some types of applications. For example, Amazon Dynamo was built to operate in a trusted environment, without authentication and authorization requirements. At a minimum, database communications and backups to the cloud need to be encrypted.

Security in cloud environments varies based on whether you use SaaS, a platform provider, or an infrastructure provider. SaaS providers bundle tools, APIs, and services, so you don't have to worry about choosing the optimal data store and security model. But if you create a private cloud or use an infrastructure provider, you'll have to select a data management tool that's consistent with your app's security needs. Your database decision also will hinge on whether the environment supports a multitenant or multi-instance model. Salesforce .com hosts apps on Oracle databases using multitenancy. Amazon EC2 supports multi-instance security. If you fire up an Amazon Machine Image running Oracle, DB2, or Microsoft SQL Server, you have a unique instance that doesn't serve other tenants. You have to authorize database users, define roles, and grant user privileges when using the infrastructure-as-a-service model.

Developers' Choices

Database app development options for public cloud computing can be limited by the providers. SaaS offerings such as Google App Engine and Force.com provide specific development platforms with predefined APIs and data stores. Private cloud and infrastructure providers including GoGrid and Amazon EC2 let users match the software, database environment, and APIs to their needs. Besides cloud storage APIs, developers can program to various APIs for data stores and standard ones for SQL/XML databases. Programmers can work with SQL APIs and APIs for cloud services. For Amazon, that involves using Web Services Description Language and invoking specific Web services. For projects that use the cloud to power Web 2.0 apps, developers can use JavaScript Object Notation and the Atom Publishing protocol.

For new applications hosted in the cloud, developers look primarily to classes of data stores such as SQL/XML databases, column data stores, distributed hash tables, and tuple spaces variants, such as in-memory databases, entity-attribute-value stores, and other non-SQL databases. Choosing the right data store depends on the scalability, load balancing, consistency, data integrity, transaction support, and security requirements. Some newer data stores have taken a minimalist approach, avoiding joins and not implementing schemas or strong typing; instead, they store data as strings or blobs. Scalability is important for very large data sets and has contributed to the recent enthusiasm for the distributed hash table and distributed key-value stores.

One interesting approach is the ability to configure fault-tolerant systems and hot backups for disaster recovery. A private cloud can be configured and operated with fairly seamless failover to Amazon EC2, for example. You'll have to rep- licate data in the private and public cloud, implementing the Amazon APIs and availability zones, as well as IP assignment and load balancing for the private cloud. You'll also have to use server configurations compatible with Amazon instances to avoid breaking applications and services because of changes in endianness, the Java heap size, and other dissimilarities.

In short, the cloud is an effective elastic computing and data storage engine, but matching the right platform with the right database is critical. Doing this correctly requires evaluating the job and its security needs, as well as assessing how easy it is to design and implement the software. Carefully weighing these factors will lead you to the right conclusion.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.