Cassandra: The Definitive Guide Book Review
Cassandra, one of the many recent database systems riding on the coattails of the NoSQL mantra, has made inroads among some of the more notable websites today. Written by Eben Hewitt, this book claims to be the definitive guide on the subject. Is it? Read on to find out.The book opens its introductory chapter asking the question "What's wrong with relational databases?" upon which the author proceeds to point out RDBMS strengths and weaknesses, followed by the Cassandra "elevator pitch" on what makes it a better choice for highly scalable, available, fault-tolerant, high performance and schema-free solution. A brief history of the project is followed by chapters on installing Cassandra and getting acquainted with its command line interface (creating a keyspace, column family and writing/reading data). Chapter 3 details the Cassandra data model, clustering, columns, rows, sorting and design patterns. A hotel reservation example Java application is demonstrated in Chapter 4, showing how the data model is constructed, the schema is defined in YAML, connecting, populating and searching. Chapter 5 reviews Cassandra's architecture, from its system keyspace, compaction and bloom filters to tombstones, managers and services, and messaging. Chapters 6, 7 and 8 cover configuration (including cluster, replication, and security settings), reading and writing data (including consistency levels, slice predicates and ranges and deletions), and clients (including Thrift, Avro, Hector, Chiton, Pelops, Kundera and Fauna). Chapters 9 through 11 cover monitoring (logging, JMX messaging, etc.), maintenance (obtaining statistics, managing load balancing), and performance tuning (storage, concurrency, caching, stress tests and JVM settings). The final chapter written by contributing writer Jeremy Hanna concludes on the topic of integrating Hadoop, the Apache Project's set of open source tools for managing large distributed data sets. Examples of a simple MapReduce job, along with several useful tools (Pig, Hive) and other best practices are suggested.
While the book supplies examples primarily written in Java, interfaces to Cassandra for popular languages like C#, Python, Ruby and Scala also exist, though no such code examples make it into the book. Due to Cassandra's Java-centricity, some developers may be put off by it due to the uncertainty in Java's future ownership direction and the fact that there are several other competing NoSQL choices vying for developer's attention these days. Thankfully, the author included an appendix that briefly explores these other choices (such as CouchDB, FlockDB, MongoDB and many others), comparing their implementation language, if it is distributed or requires a schema, what API talks to the client, the respective CAP (Consistency, Availability, Partition Tolerance) support and big website name production use. The book also has a glossary of the terms many new to the world of distributed database systems may need to get more familiar with.
In conclusion, Cassandra, a Java-based NoSQL system open-sourced by Facebook in 2008, is being used by some recognizable websites (especially among techies) including Cisco, Rackspace, Reddit, Twitter and of course, Facebook themselves. Given the technical pedigree of these high traffic web properties employing Cassandra is a testament to its robust durability. And while I hesitate agreeing that this book is indeed the 'Definitive Guide', it certainly does offer a valuable introduction to the technology. Future editions should consider having at least an appendix demonstrating how other languages besides Java talk to Cassandra, and a real gem would be a case study with a heavily invested Cassandra user like Facebook on how they implemented it, what works and what doesn't, and what's on their wishlist. But for now, the first edition of the book greases the wheels for developers and DBA's interested in learning what Cassandra has to offer.
Title: Cassandra: The Definitive Guide Author: Eben Hewitt Publisher: O'Reilly Media ISBN: 978-1-4493-9041-9 Pages: 336 Price: $31.99 (Ebook), $39.99 (Print)