Ken North is an author, consultant, industry analyst, and database specialist.
A new generation of low-cost, high-performance database software is rapidly emerging to challenge SQL's dominance in distributed processing and Big Data applications. Some companies have already traded SQL's rich functionality for these new options that let them create, work with, and manage large data sets.
A big reason for this movement, dubbed NoSQL, is that different implementations of Web, enterprise, and cloud computing applications have different requirements of their databases. Not every app requires rigid data consistency, for example.
Also, when an application uses data distributed across hundreds or even thousands of servers, simple economics points to using no-cost server software as opposed to paying per-processor license fees. Once freed from license fees, you can scale horizontally with commodity hardware or opt for a cloud computing service and avoid the capital expenses altogether. Previous tools didn't always facilitate this.
Challenges to SQL's hegemony are coming from specialized products built from the ground up for large-scale analytics and document storage, as well as for building operational systems that require high availability more than consistency when partitioning data.
Applications such as online transaction processing, business intelligence, customer relationship management, document processing, and social networking don't have identical needs for data, query, or index types, nor do they have equivalent requirements for consistency, scalability, and security.
For example, BI applications run analytical and decision-support queries that can exploit bitmap indexes for operations with gigabyte- or terabyte-sized databases. Web analytics, drug discovery, financial modeling, and similar applications look to distributed systems for efficiently processing gigabyte- to terabyte-sized data sets. OLTP puts a premium on reliability. And social network applications such as Facebook and Amazon.com have adopted BASE (basically available, soft state, eventually consistent) properties over the more familiar ACID (atomicity, consistency, isolation, durability) ones to serve their massive Web user communities of millions.
These differences are one reason non-relational NoSQL data stores, document-centric databases, and column stores have gained traction. They're more like specialized tools rather than the Swiss Army knife functionality of SQL platforms.
System architects should consider the specialized features and functions an app needs in choosing a database. NoSQL databases can be built specifically for functions such as BI, OLTP, CRM, social networks, and data warehousing, and include features such as scalability, partitioning, security, and elasticity.
Scalability And High Availability
For cloud computing and high-volume Web sites such as eBay, Amazon, Twitter, and Facebook, scalability and high availability are essential. In fact, they're the reason distributed databases have relaxed consistency requirements.
Operational systems in high-availability environments must survive software, server, and network segment failures, and provide scalability despite unpredictable surges in demand for computing resources. One approach to building such systems is to use distributed databases with a shared-nothing architecture and horizontal partitioning. Elasticity and sharding (partitioning) -- both NoSQL features -- are solutions for scaling out horizontally to provide availability and for processing Big Data.
A variety of data stores are gaining popularity for creating applications for scalable Web sites and elastic environments such as the private or public cloud. Distributed key-value stores are great when you don't need SQL rule enforcement, strong consistency, complex queries, integrated queuing, or the ability to operate with operational databases that exceed available RAM.
New low-latency data stores provide scalability for applications that don't require rich query and analytics capabilities. Amazon has developed SimpleDB, and Google developed Bigtable. Other low-latency, open source options include Cassandra, Hypertable, MongoDB, Project Voldemort, Redis, Tokyo Tyrant, and Dynamo, the database used for Amazon S3, which as of March was hosting 102 billion objects.