Thursday, December 20, 2007

Eventual consistency - following the Middle Path

One of the features of Amazon's SimpleDB is that a write to the database may not be immediately reflected to all readers, and that the user of the interface needs to be aware of this and work with it accordingly.

For those of us used to working with relational databases, which generally provide support for full read consistency, this is a somewhat revolutionary thought. But Werner Vogels, CTO of, explains in his blog that in scalable systems where data must be shared across machines, consistency inherently limits availability, particularly in larger systems where network partitioning is common:
Eric [Brewer] presented the CAP theorem, which states that of three properties of shared-data systems; data consistency, system availability and tolerance to network partition one can only achieve two at any given time. A more formal confirmation can be found in a paper by Gilbert and Lynch.

A system that is not tolerant to network partitions can achieve data consistency and availability, and often does so by using transaction protocols. To make this work, client and storage systems are part of the same environment and they fail as a whole under certain scenarios and as such clients cannot observe partitions. An important observation is that in larger distributed scale systems, network partitions are a given and as such consistency and availability cannot be achieved at the same time. This means that one has two choices on what to drop; relaxing consistency will allow the system to remain highly available under the partitionable conditions and prioritizing consistency means that under certain conditions the system will not be available.

Both require the client developer to be aware of what the system is offering. If the system emphasizes consistency, the developer has to deal with the fact that system may not be available to take for example a write. If this write fails because of system unavailability the developer will have to deal with what to do with the data to be written. If the system emphasizes availability, it may always accept the write but under certain conditions a read will not reflect the result of a recently completed write. The developer then has to make a decision about whether the client requires access to the absolute latest update all the time. There is a range of applications that can handle slightly stale data and they are served well under this model.
He then goes on to describe one form of consistency, called eventual consistency, where there is a time lag between an update and the ability of all clients to read that update.
Eventual consistency. The storage system guarantees that if no new updates are made to the object eventually (after the inconsistency window closes) all accesses will return the last updated value. The most popular system that implements eventual consistency is DNS, the domain name system. Updates to a name are distributed according to a configured pattern and in combination with time controlled caches, eventually of client will see the update.
I love this -- eventual consistency. I think this is the way things actually work on the large scale in the real world. And when you design a system this way, it allows you to, in a sense, breathe, and you end up with something that is much more tolerant and scalable.

It reminds me of the story of the Buddha. He was trying to find God, and he was so strict and severe in his austerities that he was practically starving. He was feeling lost, wondering why all this effort wasn't leading him to God. Then he overheard a man instructing a student how to tune a stringed instrument: "don't tune the string too loose, or it will make no sound. Don't tune it too tight, or the string will break." And Buddha saw this as a message to him to follow the Middle Path, to relax and breathe while still following a healthy discipline.

In a way, requiring strict consistency is like tuning the string too tight. The constraints are so severe that the system is brittle and easily breaks.

So, Grasshopper, when building scalable systems, may we follow the Middle Path of Eventual Consistency.

No comments: