Channels ▼
RSS

Database

Do All Roads Lead Back to SQL?


Much has been made in the past several years about SQL versus NoSQL and which model is better suited to modern, scale-out deployments. Lost in many of these arguments is the raison d'être for SQL and the difference between model and implementation. As new architectures emerge, the question is why SQL endures and why there is such a renewed interest in it today.

Background

In 1970, Edgar Codd captured his thoughts on relational logic in a paper that laid out rules for structuring and querying data. A decade later, the Structured Query Language (SQL) began to emerge. While not entirely faithful to Codd's original rules, it provided relational capabilities through a mostly declarative language and helped solve the problem of how to manage growing quantities of data.

Over the next 30 years, SQL evolved into the canonical data-management language, thanks largely to the clarity and power of its underlying model and transactional guarantees. For much of that time, deployments were dominated by scale-up or "vertical" architectures, in which increased capacity comes from upgrading to bigger, individual systems. Unsurprisingly, this is also the design path that most SQL implementations followed.

The term "NoSQL" was coined in 1998 by a database that provided relational logic but eschewed SQL. It wasn't until 2009 that this term took on its current, non-ACID meaning. By then, typical deployments had already shifted to scale-out or "horizontal" models. The perception was that SQL could not provide scale-out capability, and so new non-SQL programming models gained popularity.

Fast-forward to 2013 and after a period of decline, SQL is regaining popularity in the form of NewSQL implementations. Arguably, SQL never really lost popularity (the market is estimated at $30 billion and growing), it just went out of style. Either way, this new generation of systems is stepping back to look at the last 40 years and understand what that tells us about future design by applying the power of relational logic to the requirements of scale-out deployments.

Why SQL?

SQL evolved as a language because it solved concrete problems. The relational model was built on capturing the flow of real-world data. If a purchase is made, it relates to some customer and product. If a song is played, it relates to an artist, an album, a genre, and so on. By defining these relations, programmers know how to work with data, and the system knows how to optimize queries. Once these relations are defined, then other uses of the data (audit, governance, etc.) are much easier.

Layered on top of this model are transactions. Transactions are boundaries guaranteeing the programmer a consistent view of the database, independent execution relative to other transactions, and clear behavior when two transactions try to make conflicting changes. That's the A (atomicity), C (consistency), and I (isolation) in ACID. To say a transaction has committed means that these rules were met, and that any changes were made Durable (the D in ACID). Either everything succeeds or nothing is changed.

Transactions were introduced as a simplification. They free developers from having to think about concurrent access, locking, or whether their changes are recorded. In this model, a multithreaded service can be programmed as if there were only a single thread. Such programming simplification is extremely useful on a single server. When scaling across a distributed environment, it becomes critical.

With these features in place, developers building on SQL were able to be more productive and focus on their applications. Of particular importance is consistency. Many NoSQL systems sacrifice consistency for scalability, putting the burden back on application developers. This trade-off makes it easier to build a scale-out database, but typically leaves developers choosing between scale and transactional consistency.

Why Not SQL?

It's natural to ask why SQL is seen as a mismatch for scale-out architectures, and there are a few key answers. The first is that traditional SQL implementations have trouble scaling horizontally. This has led to approaches like sharding, passive replication, and shared-disk clustering. The limitations are functions of designing around direct disk interaction and limited main memory, however, and not inherent in SQL.

A second issue is structure. Many NoSQL systems tout the benefit of having no (or a limited) schema. In practice, developers still need some contract with their data to be effective. It's flexibility that's needed — an easy and efficient way to change structure and types as an application evolves. The common perception is that SQL cannot provide this flexibility, but again, this is a function of implementation. When table structure is tied to on-disk representation, making changes to that structure is very expensive; whereas nothing in Codd's logic makes adding or renaming a column expensive.

Finally, some argue that SQL itself is too complicated a language for today's programmers. The arguments on both sides are somewhat subjective, but the reality is that SQL is a widely used language with a large community of programmers and a deep base of tools for tasks like authoring, backup, or analysis. Many NewSQL systems are layering simpler languages on top of full SQL support to help bridge the gap between NoSQL and SQL systems. Both have their utility and their uses in modern environments. To many developers, however, being able to reuse tools and experience in the context of a scale-out database means not having to compromise on scale versus consistency.

Where Are We Heading?

The last few years have seen renewed excitement around SQL. NewSQL systems have emerged that support transactional SQL, built on original architectures that address scale-out requirements. These systems are demonstrating that transactions and SQL can scale when built on the right design. Google, for instance, developed F1 because it viewed SQL as the right way to address concurrency, consistency, and durability requirements. F1 is specific to the Google infrastructure but is proof that SQL can scale and that the programming model still solves critical problems in today's data centers.

Increasingly, NewSQL systems are showing scale, schema flexibility, and ease of use. Interestingly, many NoSQL and analytic systems are now putting limited transactional support or richer query languages into their roadmaps in a move to fill in the gaps around ACID and declarative programming. What that means for the evolution of these systems is yet to be seen, but clearly, the appeal of Codd's model is as strong as ever 43 years later.


Seth Proctor serves as Chief Technology Officer of NuoDB Inc. and has more than 15 years of experience in the research, design, and implementation of scalable systems. His previous work includes contributions to the Java security framework, the Solaris operating system, and several open-source projects.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-d0bfd93dc75aea294c4bfb005b29af8c
2013-12-16T17:10:41

The article says "It's natural to ask why SQL is seen as a mismatch for scale-out architectures, and there are a few key answers. The first is that traditional SQL implementations have trouble scaling horizontally. This has led to approaches like sharding, passive replication, and shared-disk clustering."

This has me confused. SQL is only a language for interacting with a DBMS. An implementation of SQL may have visibility of sharding, but not replication or shared-disk clustering. Remember, the relational model (and SQL, to the extent that it's associated with the relational model) is supposed to shield developers from the physical storage details.

So the statement "traditional SQL implementations have trouble scaling horizontally" can be true. But the assertion that sharding, passive replication, and shared-disk clustering came about to overcome any limitation of SQL is not. They’ve come about to improve [horizontal] scalability, and SQL can just go along for the ride.

A common justification for NoSQL is that /joins/ don't scale. That would be particularly true if a row stored on nodeA of a cluster needs to be joined to a row stored on nodeB of a cluster. But, even in a single node, the necessity to join 2 or more rows can never compete with the ability to fetch all the data, all at once. But, you don’t need SQL to have joins. So NoSQL database systems (as the author correctly points out) are typically schema-less. Data is effectively pre-joined. Each designer/developer has to decide whether the loss of all the goodness that comes with relational, SQL, and joins is outweighed by that performance advantage. I venture that for all but the most extreme cases, it is not (which lends credibility to the author's assertion that SQL is regaining popularity, further substantiated by RGIBSON000’s real-world experience).

Again, sharding, passive replication, and shared-disk clustering are not incompatible with SQL. Distributed query processing can access data from multiple shards, if necessary. And, replication (passive, or not) is done below the level of SQL, at the DBMS engine. Lastly, Oracle RAC is probably the best known example of shared-disk clustering.

Later, the article goes on "In practice, developers still need some contract with their data to be effective. It's flexibility that's needed — an easy and efficient way to change structure and types as an application evolves. The common perception is that SQL cannot provide this flexibility..."

I enthusiastically agree with the first part of this snippet, and equally enthusiastically disagree with the latter part. I think the perception is that SQL can provide this flexibility: the ALTER TABLE statement handles it well. It is true that one or another implementation may be prohibitively expensive, but that is an implementation issue, not a shortcoming of SQL's ALTER TABLE itself.

NoSQL systems don't make it any easier to modify the structure of existing data. In fact, in the absence of an equivalent to ALTER TABLE, I'd argue they make it more difficult, and limit the ability to mitigate the potential of being prohibitively expensive. For example, an implementation of ALTER TABLE can simply record the schema modification and migrate the data from version 1 to version 2 over time as the data is updated, spreading the cost out versus making one hugely expensive sweep through the database to transform the structure. Without a schema, NoSQL systems don't have this ability. To change the data in the database to (for instance) add a field, every instance has to be read, modified in the program, then written back to the database.

This article also has an interesting take on the "what's old, is new" angle on NoSQL: http://www.opensourceconnectio...

In conclusion, NewSQL may be an the answer to the sacrifices that NoSQL imposes. But the characterization in this article of the reasons for the rise of NoSQL don't ring true to me.


Permalink
ubm_techweb_disqus_sso_-dc42a8902329c558b332ae80a2d91a83
2013-12-05T00:15:58

It is now time to reprint Fabian Pascal's proper introduction to RDBMS "Understanding relational databases with examples in SQL-92".


Permalink
ubm_techweb_disqus_sso_-dc42a8902329c558b332ae80a2d91a83
2013-12-05T00:13:06

SQL is not Relational in the strict sense and has nothing to do with Codd. IIRC Codd thought SQL was detrimental.


Permalink
ubm_techweb_disqus_sso_-dc42a8902329c558b332ae80a2d91a83
2013-12-05T00:11:32

The beauty of Codd's work is that if you follow his rules you are guaranteed to have no surprises if you designed the tables properly. But structure changes are fairly easy to from a logical point of view. Scaling is a separate issue from the logical layer of data and NuoDB has an interesting approach to solving the scaling issue while keeping Codd's simplicity and guarantees based upon mathematical modeling.


Permalink
ubm_techweb_disqus_sso_-49eb93aa4f856acae434d5242405a683
2013-12-04T11:45:10

Good article. When my company switched to NoSQL from SQL I was concerned about the huge loss of functionality provided by our existing SQL DB. We sacrificed a lot of functionality with only one major benefit - ability to scale out.

Apart from ACID etc., the SQL language is itself very powerful in retrieving quick answers to system problems or customer queries. It is especially useful for "support" staff that investigate such queries every day. NoSQL query tools don't seem to match up. I'm glad to hear that SQL is gaining popularity again.


Permalink
ubm_techweb_disqus_sso_-a236e6e181827b4cf1098dc67759828b
2013-10-22T00:31:05

This paper is good in bringing together the issue of how new SQL implementations like NuoDB and F1 are addressing the issue of scalability. In our research at Espresso Logic, we found few other things that are restricting SQL growth.

Ease of connectivity from mobile - thereby having to directly access data over RESTful interface will be great

Simplification of business logic creation - the current method in current SQL world are just too restrictive. This needs something like Reactive Programming type logic to solve the problem.

We at Espresso Logic provide an instant RESTful interface to SQL data and automates lot of business logic programming using Reactive programming concepts. That we think is what is also needed to make SQL popular again.


Permalink

Video