Connecting Big Data around the World

Tim Kraska

The internet enables millions of users world-wide to create, modify and share data through platforms like Twitter, Facebook, GMail and many other services generating gigantic data sets. This world-wide data access requires replicating the data across multiple data centers, not only to bring the data closer to the user for shorter latencies, but also to increases the availability in case of a data center failure.
However, keeping replicas synchronized and consistent so that a user’s data is never lost and up-to-date, is expensive. Inter-data center network delays are in the hundreds of milliseconds and vary significantly. Therefore, synchronous wide-area replication has been assumed unfeasible with strong consistency for interactive applications and current solutions either settle for asynchronous replication which implies the risk of losing data in the event of failures, or relaxed consistency, which for example can cause updates to appear and disappear from the application in an unpredictable fashion.

With MDCC (Multi-Data Center Consistency), we describe the first synchronous replication protocol, that does not require a master or static partitioning, and is strongly consistent at a cost similar to eventually consistent protocols by using only a single round-trip across data centers in the normal operational case to apply an update. That is, not only do users get faster response times by locating the data close to them, but also they always experience the same consistency and application behavior regardless of the presence of major failures. We further propose a new programming model which empowers the application developer to handle longer and unpredictable latencies caused by inter-data center communication more effectively. Our evaluation using the TPC-W benchmark, a benchmark simulating a web-shop like Amazon, with MDCC deployed across 5 geographically diverse data centers shows that MDCC is able to achieve transaction throughput and latency similar to eventually consistent quorum protocols and that MDCC is able to sustain a data-center outage without a significant impact on response times, all while guaranteeing strong consistency.

For more information please visit our MDCC web-site.

MDCC was developed by Tim Kraska, Gene Pang, Mike Franklin and Samuel Madden.