Monday, January 26, 2009
Interesting stuff from LinkedIn - Project Voldemort
Although the name is a bit odd, Project Voldemort itself looks interesting.
The folks at LinkedIn have open sourced a distributed cache/storage engine under the Apache 2.0 license. The interface looks a lot like memcached: get(key), put(key, value), delete(key). The key (haha) difference is that it is not just a cache - it's also provides persistent storage.
I recommend taking a look at their design page. Here are some things
No structure, no queries
Project Voldemort explicitly eliminates the structured form of relational databases and queries, just like memcached. This means if you want to do things like queries, joins, etc., then you need to do it yourself.
It appears that one way they solve this is by building pre-built "answers" to queries by running Hadoop queries and then putting the result back into the storage engine en masse. Much more efficient than trying to run the queries against your "live" store.
Eventual Consistency and Ordering of Versions
They also seem to be following the principles laid out by Werner Vogels and the Amazon team around providing eventual consistency.
I also particularly liked how they do versioning (a version is defined by a tuple of server numbers and version numbers) and how they handle conflicts and fix consistency issues: they go ahead and write whenever you want to write, and then when someone does a read, they look at the various versions and make a decision who wins (or decide there is a conflict and mark it as such so that the problem can be resolved manually).
But Does It Work?
Being a long-term database guy, I always wonder what key functionality you are giving up when you go for the simple key/value way of doing things. I know it scales, and I know it is fast, and I know it avoids issues with network partitioning. But what requirements does it place on the client as a result? They mention, for instance, that this solution separates business logic from data storage, and that's a good thing. It's funny, because in my Sybase days, placing business logic close to data storage was considered the right way to go - function shipping instead of data shipping.
Anyway, it looks like another distributed key-value store has hit the streets. I have some time right now, maybe I'll take a closer look. And I'll be doing the same thing with SimpleDB and CouchDB while I'm at it...