Thursday, June 11, 2009

Do we *really* want SQL in HTML5?

Jan Lehnardt's tweet pointed me to this ongoing conversation about the new SQL support in HTML5. Vladimir Vukicevic I think summarizes the issue quite well:
There's been a lot of interest around the Web Storage spec (formerly part of whatwg HTML5), which exposes a SQL database to web applications to use for data storage, both for online and offline use. It presents a simple API designed for executing SQL statements and reading result rows. But there's an interesting problem with this; unlike the rest of the HtML5, this section defines a core piece of functionality in terms of an undefined chunk referenced as "SQL".

The initial implementations of Web Storage are both based on SQLite, and expose the dialect of SQL understood by SQLite to web content. I'm actually a big fan of SQLite, and was one of the advocates for pulling it into the Gecko platform. However, SQLite implements a variant of SQL, with a number of deviations from other SQL engines, especially in terms of the types of data that can be placed in columns.

Web content that is created to use database storage with SQLite as the backing is unlikely to work with any other backend database. Similarly, if another database was chosen as a browser's backing implementation, web content that works with it is unlikely to work with anything else. This is a serious interop problem, the root of which is that there really isn't a useful core SQL standard. SQL92 is generally taken as a base, but is often extended or altered by implementations. Even beyond the parser issues (which could be resolved by defining a strict syntax to be used by Web Storage), the underlying implementation details will affect results.

So, the only option is for the Web Storage portion of the spec to state "do what SQLite does". This isn't specified in sufficient detail anywhere to be able to reimplement it from the documents, so it would be even worse — "do what this exact version of SQLite does", because there are no guarantees that SQLite won't make any incompatible changes. For example, a future SQLite 4 may introduce some changes or some new syntax which wouldn't be supported by earlier versions. Thus, it requires every single browser developer to accept SQLite as part of their
platform.

There then follows a lot of discussion about how to handle this, from creating an ORM layer to fully specifying the standard subset of SQL that will be supported in HTML5 (as if there were a valid standard). One person argues that we can simply refer to the ISO SQL standard and then describe extensions to that standard.

OK, well, I have worked with relational databases for over 20 years, and I have seen how completely stuck customers are with a given database vendor. I don't care how "standard" the standard is, it's merely an illusion. There is a reason Oracle can charge such huge sums of money and smile as customers fume. There is a reason Sybase is still in business. It is almost impossible to move a large database application from one vendor to another.

We all know how difficult it is to write code today that is compatible across all browsers. Imagine if each browser had its own SQL database that "implemented the standard." That's where we're heading, and it sure seems hellish to me. I guess if we forced every browser to use SQLite, we'd be OK, but that seems to placing an awful lot of reliance on a single technology. And as Vladimir points out, that doesn't solve issues of incompatibility across versions of SQLite.

It seems to me the requirements are to have rich storage, particularly for Javascript objects, and a way to query over this storage. I know saying is easier than doing, but shouldn't we be defining an API that defines JSON storage and a JSON query language? The Persevere folks have already defined a JSON query language, and they seem to have put a lot of thought into it. Perhaps that could be used as a start.

I can't say I have a solution, but I'm just raising a flag that creating a standard as important as HTML5 with a placeholder for "SQL" is an invitation for major incompatibilities which we'll be digging ourselves out of for years to come.

3 comments:

jlorenzen said...

Google Gears has an offline ability that I assume includes some type of data storage. How do they implement it?

cigraphics said...

@jlorenzen Google Gears uses SQLite for storage

Drakanor said...

There are a lot of excellent programming languages for web applications, more to come. All of them can handle database access safer and on several levels. There is ORM, Pooling and Caching. No large site will run without this.

So who the hell needs SQL in HTML? Blogs and forums are crowded with people doing strange and cruel things with LAMP or WAMP, representing a huge crowd of wannabe coders. Please don't make this even easier, it won't help anybody. ;)