Tuesday, July 20, 2010

Mass extinctions, basalt flows, and us

Verbatim from my Pa (Dr. John Van Couvering, geologist and all-around Scientist), who wrote this up for a his local neighborhood newsletter in upstate New York

Most of us have heard of the great extinction that wiped the dinosaurs off the face of the earth.  Of course this didn’t really happen, because birds – aha! -- are dinosaurs. But few may be aware that the human presence on the planet follows, not one, but four narrow escapes from extinction, going all the way back before dinosaurs even evolved. We are rather like Fearless Fosdick (Li’l Abner’s comic hero) who pops up unscathed, with the lid on his head, from his hiding place in a trash can that has been riddled to shreds by the mob’s tommy guns. The crooks are gobsmacked. “Fosdick! We shot ya fulla holes!” “Took a bit of dodging,” he jauntily replies.

What did we dodge? In fact, those “extinction events” are deadly beyond belief: moments in planetary history when nearly everything on land dies, as well a great deal of marine life. In the last of these, 65 million years ago, four entire orders of dinosaurs were exterminated, plus all but the airborne varieties of a fifth order, as well as two out of three orders of coral and entire groups of other marine life. To put the extinction of an order in perspective, extinguishing the modern Carnivora would require that every single living animal in that order -- every last cat, lion, tiger, leopard, cheetah, puma, jaguar, hyaena, aardvark, civet, dog, wolf, fox, bear, panda, seal, sea lion, walrus, weasel, mink, skunk, badger, wolverine, otter, raccoon, kinkajou, coati mundi and meerkat – would have to perish. If even one lonely pair of any of these species survived to reproduce, so would the Order Carnivora. The magnitude of disaster that can totally erase not one but a dozen orders is difficult to comprehend.

Huge meteorite impacts have been found to coincide with some major extinction events, but they may have been just the last straw. For one thing, giant impacts are turning out to be surprisingly common, with little or no extinction effect. While the asteroid that punched into Yucatan may have finished off Tyrannosaurus and the tetracorals, they were apparently sick and dying already. A few million years later, the equally large flying rock that blew a 57-mile wide hole into the bedrock under what is now Delaware and Maryland had no effect on earthly life whatever. Not only that, no impact has been found to coincide with the worst extinction of all, the Permo-Triassic event 250 million years ago, that killed off all but one species of non-amphibian land animals – which species, of course, is our ancestor. The real cause of extinction events appears to be flood basalts – extraordinary volcanic episides that go on for a million years as flow after flow of lava wells from cracks in the continent to build up country-sized mounds called “traps”. It is probable, but not yet proven, that the gases emitted from the thousand of cubic kilometers of flood basalts create toxic pollution of land and sea. The Deccan Traps of central India mark the end of the dinosaurs, while the traps that underlie the steppes of Siberia are the same age as the great Permo-Triassic extinction.

Given the odds, it is to be expected that mass extinctions have chancy, even unfair consequences. After the Permo-Triassic massacre our ancestral line succesively developed into anapsids (turtles), synapsids (mammals) and diapsids (everything else). Each of these is a higher stage of evolution, with a basic physiology better adapted to life on land. By some freak chance, a single lowly synapsid got through the 190-my extinction at the end of theTriassic, along with a variety of diapsids.on their way to crocodiles, lizards, and all kinds of dinosaurs. This primitive synapsid line, which seems to have favored bugs and fruit, quietly developed warm-blooded circulation, milk glands and eventually a uterus. And when the grand house of the dinosaurs came crashing down, guess who was hiding under the ruins?

Our final lucky break came around 48 million years ago, when a solitary species of primitive primate got stranded on the (then) island of Africa, never to see its many relatives in North America and Asia again. Bingo, Columbia River basalts. This was not one of the great extinction events, but it was enough to kill every primate in the world – except for that forgotten waif in Africa. Now, many lemurs and lorises and monkeys and apes later, here we are, self-importantly going on about our human-caused extinction event. If we knew how far we have yet to go, perhaps we just might give it up.

Friday, July 16, 2010

Apache Cassandra: materialized queries (one table per query please)

Max Grinev has done an excellent job giving a quick overview of Cassandra here, and then has a followup blog post that discusses how to accomplish various SQL-style operations such as select, join, and group by.  Very helpful, thanks, Max!

Note that in every example, you don't write a query against the existing data.  Instead, you create a new column family to represent the query.  It's as if you write your query as a data structure.   It's basically a materialized view.   Max argues that this is OK for most applications: "However, typically in Web applications and enterprise OLTP applications queries are well known in advance, few in number, and do not change often."

OK, fair enough, but it is definitely something to keep in mind: if you want to support ad-hoc queries in your application, then Cassandra is probably not the right choice.

It also looks like it is your job as a user of Cassandra to update all your various views when data changes.   I agree with Maxim that denormalization and "push on change" rather than "pull on demand" is probably the right approach for highly intertwined systems like Twitter and Facebook.  But it would be nice if the system helped you maintain consistency across all these denormalized copies.

For example, CouchDB has a very similar model of materialized views, but their views are defined as map/reduce operations on the primary document(s), and CouchDB takes care of keeping the views in sync as you update your data.  That saves me a lot of time and worry.

Perhaps something like that exists in Cassandra and I don't know about it?

Anyway, I've been very interested in Cassandra, and Max's blogs were quite helpful in  helping me get my head around it.  I highly recommend reading them if you'd like to learn more about Cassandra, or what a distributed column-oriented key/value store looks like (Cassandra is modeled after Google's BigTable).