Friday, June 20, 2008

Query languages for Hadoop, AND tooling for the non-relational stores

Tom White just blogged about three, count them three, different query languages being built for Hadoop.
  • Pig, from Yahoo! and now incubating at Apache, has an imperative language called Pig Latin for performing operations on large data files.
  • Jaql, from IBM and soon to be open sourced, is a declarative query language for JSON data.
  • Hive, from Facebook and soon to become a Hadoop contrib module, is a data warehouse system with a declarative query language that is a hybrid of SQL and Hadoop streaming.

As someone working on database tooling, I have been thinking about what we at NetBeans might do to make it easier for developers to build apps against these new platforms/products like Hadoop and CouchDB and Amazon services.

These query languages are an example - I can see an interactive query editor that lets you write and test queries directly against your Hadoop or CouchDB engine, much as you can today with SQL tools.

Another interesting question is - if I write my application in terms of domain objects, what things can a tool do to make it easy to map these domain objects to an underlying non-structured store like Hadoop or CouchDB or SimpleDB.

Definitely some opportunities here to make developers' lives easier.

1 comment:

JanL said...

"I can see an interactive query editor […]"

CouchDB's comes with Futon, the administration interface that has just that, an interactive editor that lets you build and tweak queries on the fly.