Channels ▼

Ken North

Dr. Dobb's Bloggers

Sharding, Replication, Caches, and In-Memory Databases

June 30, 2011

Caching, replication, and sharding have proven to be important tools in the modern database architect's toolbox. To meet the requirements of high-throughput websites, system architects are turning to distributed caches and in-memory data grids to reduce I/O operations against disk-based databases.

Developers looking for a cache solution for SQL databases often turn to memcached, an in-memory key-value store. It supports writing cache code using a variety of languages, including Java, C++, PHP, Ruby, Erlang, and Python.

Another approach to caching is to stay with the SQL paradigm for both cache and database operations. This is possible by using an in-memory SQL database as a cache for on-disk databases. One such in-memory database, H-Store, was shown to deliver excellent performance when properly partitioned. It morphed into a commercial product known as VoltDB. Other products that support main memory database operations include Oracle TimesTen, McObject eXtremeDB, and HyperSQL Database (HSQLDB). Besides acting as a cache for large on-disk databases, the in-memory databases are often embedded in applications, such as low-latency trading systems.

Going back to the analogy from my previous post, the flood control system for the Mississippi River often relied on distributaries to relieve the pressure on levees by distributing the flow of water. For very large database (VLDB) architectures, horizontal partitioning can provide a similar volume-reduction distribution function — in this case, the distribution of data so that multiple servers process queries.

Sharding

Sharding, or horizontal partitioning of data, is a proven solution for web-scale databases, such as those in use by social networking sites. One such site is Netlog, a social network that has 100 million friendships, 50 million unique visitors per month, and 5 billion page views per month. Netlog runs MySQL, with the typical database getting 3000+ queries per second. The data distribution architecture evolved through several partitioning schemes based on master-slave replication before settling on a sharding scheme based on user ID. The application has been designed to avoid expensive, cross-shard queries.

Manual methods are often used in the layout of horizontal partitions, but recent research and tools can assist the database designer in laying out shards. Hibernate is an open-source, object-relational mapping (ORM) tool for Java and . NET developers. It's a popular solution for providing a persistence framework for Java developers, and the Hibernate project has taken on the challenge of database partitioning. Hibernate Shards enables developers to work with data partitions while using the Hibernate Core API . To meet the re-sharding challenge, it supports virtual shards.

SQLAlchemy is a database toolkit for Python that enables a session to distribute queries across multiple databases. Besides providing object-relational mapping (ORM) support, SQLAlchemy also provides horizontal partitioning with ShardedSession and ShardedQuery classes.

MIT is the home of the RelationalCloud project, which focuses on Database-as-a-Service (DaaS). Cloud computing is characterized by dynamic workloads and a requirement for scalability and elastic computing, which will challenge any rigid data partitioning scheme. At the VLDB 2010 conference, a paper from MIT researchers presented Schism, a scheme for letting workloads drive replication and partitioning. The purpose of Schism partitioning is twofold; first, to balance partitions and second, to minimize distributed transactions. Researchers tested Schism with a variety of workloads. These included online transaction processing (OLTP) and complex social networks with multiple n-to-n relationships. The Schism researchers created a graph that represented the database and workload, and then used the METIS algorithm to partition the graph.

But sharding alone does not guarantee that database scalability problems won't cause a site to fall over. An outage at Foursquare.com serves as a reminder to constantly monitor workloads and distribution of data — and to rebalance when necessary.

Click here to read Part 1 of this post.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video