NoSQL and MySQL Cluster
Like InnoDB, MySQL Cluster provides key-value access to its data through a Memcached API. MySQL Cluster's architecture lends itself to providing simultaneous access to the same data through additional APIs.
MySQL Cluster appears to clients as a single logical database even though underneath it's a distributed, shared-nothing, auto-sharded data store. As illustrated in the Figure 3, MySQL Cluster actually comprises three types of nodes (processes):
- Data nodes manage the storage and access to data. Tables are automatically sharded across the data nodes, which also transparently handle load balancing, replication, failover, and self-healing.
- Management nodes are used to configure the cluster and provide arbitration in the event of network partitioning.
Figure 3: The layered design of MySQL Cluster.
In this way, MySQL Cluster delivers capabilities that are often considered attributes of NoSQL data stores:
- Built-in High Availability (synchronous replication locally within a cluster, asynchronous with remote clusters)
- Linear scale-out of read and write performance
- DevOps flexibility (on-line schema changes, addition of new nodes, upgrades and backups)
- High throughput and low latency (by default, all data is stored in RAM and then checkpointed to disk)
Because of this architecture, there is a clear separation of how the data is stored/processed and how it is accessed. At the lowest layer, all access to the data is via a native C++ client library (the NDB API), which implements the wire protocol to the data nodes; this client library is then used by the various APIs (including the MySQL Server). This is illustrated in Figure 4.
Figure 4: The MySQL Cluster stack.
By avoiding the SQL layer, the latency for these NoSQL APIs can be kept to a minimum.
Memcached API Implementation for MySQL Cluster
Memcached API support was introduced in MySQL Cluster 7.2. MySQL Cluster extends Memcached by adding support for write-intensive workloads, a full relational model with ACID compliance (including persistence), rich query support and auto-sharding.
Unlike the Memcached API for InnoDB, the MySQL Cluster version is implemented as a plugin within an external Memcached process (which then uses the NDB API client library to communicate with the Data Nodes storing the data). The implementation is simple:
- The application sends reads and writes to the Memcached process (using the standard Memcached API).
- This invokes the Memcached Driver for NDB (which is part of the same process)
- The NDB API is called, providing very quick access to the data held in MySQL Cluster's data nodes.
It is possible to co-locate the Memcached API in either the data nodes or application nodes, or within a dedicated Memcached layer depending on scalability and colocation needs. Note that every Memcached server has access to all of the data.
Developers can still have some or all of the data cached within the Memcached server (and specify whether that data should also be persisted in MySQL Cluster), so it is possible to choose how to treat different pieces of data:
- Storing the data purely in MySQL Cluster is best for data that is volatile (written to and read from frequently)
- Storing the data both in MySQL Cluster and in Memcached is often the best option for data that is rarely updated but frequently read
- Data that has a short lifetime, is read frequently, and does not need to be persistent could be stored only in Memcached
DevOps can configure this behavior on a per-key-prefix basis and the application doesn't have to care it just uses the Memcached API and relies on the software to store data in the right place(s) and to keep everything synchronized.
Using Memcached for Schemaless Data or Relational Tables
By default, every key/value is written to a common table with each pair stored in a single row thus allowing schemaless data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a predefined column in a specific table. The mapping of values to multiple columns is also supported.
Of course, if the application needs to access the same data through SQL, then developers can map key-prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.
To avoid duplication of many of the steps described above for InnoDB, a detailed walkthrough on configuring and using the Memcached API with MySQL Cluster can be found at the Scalable, persistent, HA NoSQL Memcache storage using MySQL Cluster page.
The choice between SQL (rich, flexible queries, ACID transactions, mature software, and rich toolsets) and NoSQL (simple access patterns, high availability, and scalability) is often presented as an either/or proposition. This article demonstrates how you can use both options with MySQL database or MySQL Cluster.
Andrew Morgan is the Principal MySQL Product Manager at Oracle and Matt Lord is the MySQL Product Manager.