Tuesday, March 11, 2008

Stonebraker's H-Store: There's something happenin' here



It was fascinating for me to read this overview by Curt Monash of Michael Stonebraker's new database architecture for the web called the H-Store. It is a complete overturning of the traditional database architecture. And I quote:
On the system side, some of their most radical suggestions include:
  • No disks or other persistent storage at all.
  • No multi-threading.
  • No locks
  • No redo logs (and perhaps not a lot of undo logs either).
Wow. And here's something else: the engine has a single API: execute transaction(parameters). In other words, all it does is stored procedures. And the language they are proposing for these stored procedures? Ruby.

We're not talking small changes here. What is motivating these kinds of changes? The web. In particular, traditional databases and database APIs like JDBC and ODBC are tuned for a very old style of application, where the user is working over a set of data on a single connection which is holding some level of lock on the database. In Sybase we used to call this "browse mode," and then this got standardized in SQL with the concept of cursors. I remember we had to do a lot of work in the engine and the client API to support browse mode/cursors, because our customers were demanding it.

But now in many cases, this style of interaction with a database is completely irrelevant. Cursors just do not make sense for a web-based application, where the user comes and goes and the last thing you want to do is hold on to a connection across requests. Instead, you have to use loose transactional models like optimistic concurrency to handle conflicts. And when you let go of locks and long-running transactions, you can start letting go of a lot of other aspects of the traditional database.

And then there is web scale. Standard client/server apps needed to scale, but not at the level of consumer-based web applications. A traditional database's transactional, single machine, disk-based model just doesn't cut it. I was at a PHP conference last year, and was amazed to see that the "standard" way of scaling a web application that talks to the database is to do your own partitioning and run multiple instances of a database. When an application has to handle all of the details of managing a self-partitioned database, we know things have gone astray.

The list above is a great summary, but I recommend you spend the time to read the 10-page paper (PDF). It is quite an interesting read. It's not every day you read quotes like There is no redo log, no concurrency control, and no distributed commit processing and H-Store ran 70,416 TPC-C transactions per second. In contrast, we could only coax 850 transactions per
second from the commercial system, in spite of several days of tuning by a professional DBA.


All of this leads me to some deeper (or more strategic) thoughts... The relational database is not going anywhere. But if those of us in the database space are not careful, we could very well be caught off guard by a new wave of database technologies that do the job for a lot of people but have nothing to do with traditional relational databases.

How does that Buffalo Springfield song go - stop, hey, what's that sound, everybody look what's goin' down...

1 comment:

Tom White said...

It is indeed an interesting time for databases. If you haven't already, you might be interested in checking out HBase, another column-oriented database that is a clone of Google's Bigtable. There's a nice write-up here.

Tom