Much has been made in the past several years about SQL versus NoSQL and which model is better suited to modern, scale-out deployments. Lost in many of these arguments is the raison d'être for SQL and the difference between model and implementation. As new architectures emerge, the question is why SQL endures and why there is such a renewed interest in it today.
In 1970, Edgar Codd captured his thoughts on relational logic in a paper that laid out rules for structuring and querying data. A decade later, the Structured Query Language (SQL) began to emerge. While not entirely faithful to Codd's original rules, it provided relational capabilities through a mostly declarative language and helped solve the problem of how to manage growing quantities of data.
Over the next 30 years, SQL evolved into the canonical data-management language, thanks largely to the clarity and power of its underlying model and transactional guarantees. For much of that time, deployments were dominated by scale-up or "vertical" architectures, in which increased capacity comes from upgrading to bigger, individual systems. Unsurprisingly, this is also the design path that most SQL implementations followed.
The term "NoSQL" was coined in 1998 by a database that provided relational logic but eschewed SQL. It wasn't until 2009 that this term took on its current, non-ACID meaning. By then, typical deployments had already shifted to scale-out or "horizontal" models. The perception was that SQL could not provide scale-out capability, and so new non-SQL programming models gained popularity.
Fast-forward to 2013 and after a period of decline, SQL is regaining popularity in the form of NewSQL implementations. Arguably, SQL never really lost popularity (the market is estimated at $30 billion and growing), it just went out of style. Either way, this new generation of systems is stepping back to look at the last 40 years and understand what that tells us about future design by applying the power of relational logic to the requirements of scale-out deployments.
SQL evolved as a language because it solved concrete problems. The relational model was built on capturing the flow of real-world data. If a purchase is made, it relates to some customer and product. If a song is played, it relates to an artist, an album, a genre, and so on. By defining these relations, programmers know how to work with data, and the system knows how to optimize queries. Once these relations are defined, then other uses of the data (audit, governance, etc.) are much easier.
Layered on top of this model are transactions. Transactions are boundaries guaranteeing the programmer a consistent view of the database, independent execution relative to other transactions, and clear behavior when two transactions try to make conflicting changes. That's the A (atomicity), C (consistency), and I (isolation) in ACID. To say a transaction has committed means that these rules were met, and that any changes were made Durable (the D in ACID). Either everything succeeds or nothing is changed.
Transactions were introduced as a simplification. They free developers from having to think about concurrent access, locking, or whether their changes are recorded. In this model, a multithreaded service can be programmed as if there were only a single thread. Such programming simplification is extremely useful on a single server. When scaling across a distributed environment, it becomes critical.
With these features in place, developers building on SQL were able to be more productive and focus on their applications. Of particular importance is consistency. Many NoSQL systems sacrifice consistency for scalability, putting the burden back on application developers. This trade-off makes it easier to build a scale-out database, but typically leaves developers choosing between scale and transactional consistency.
Why Not SQL?
It's natural to ask why SQL is seen as a mismatch for scale-out architectures, and there are a few key answers. The first is that traditional SQL implementations have trouble scaling horizontally. This has led to approaches like sharding, passive replication, and shared-disk clustering. The limitations are functions of designing around direct disk interaction and limited main memory, however, and not inherent in SQL.
A second issue is structure. Many NoSQL systems tout the benefit of having no (or a limited) schema. In practice, developers still need some contract with their data to be effective. It's flexibility that's needed an easy and efficient way to change structure and types as an application evolves. The common perception is that SQL cannot provide this flexibility, but again, this is a function of implementation. When table structure is tied to on-disk representation, making changes to that structure is very expensive; whereas nothing in Codd's logic makes adding or renaming a column expensive.
Finally, some argue that SQL itself is too complicated a language for today's programmers. The arguments on both sides are somewhat subjective, but the reality is that SQL is a widely used language with a large community of programmers and a deep base of tools for tasks like authoring, backup, or analysis. Many NewSQL systems are layering simpler languages on top of full SQL support to help bridge the gap between NoSQL and SQL systems. Both have their utility and their uses in modern environments. To many developers, however, being able to reuse tools and experience in the context of a scale-out database means not having to compromise on scale versus consistency.
Where Are We Heading?
The last few years have seen renewed excitement around SQL. NewSQL systems have emerged that support transactional SQL, built on original architectures that address scale-out requirements. These systems are demonstrating that transactions and SQL can scale when built on the right design. Google, for instance, developed F1 because it viewed SQL as the right way to address concurrency, consistency, and durability requirements. F1 is specific to the Google infrastructure but is proof that SQL can scale and that the programming model still solves critical problems in today's data centers.
Increasingly, NewSQL systems are showing scale, schema flexibility, and ease of use. Interestingly, many NoSQL and analytic systems are now putting limited transactional support or richer query languages into their roadmaps in a move to fill in the gaps around ACID and declarative programming. What that means for the evolution of these systems is yet to be seen, but clearly, the appeal of Codd's model is as strong as ever 43 years later.
Seth Proctor serves as Chief Technology Officer of NuoDB Inc. and has more than 15 years of experience in the research, design, and implementation of scalable systems. His previous work includes contributions to the Java security framework, the Solaris operating system, and several open-source projects.