Tuesday, July 20, 2010

Mass extinctions, basalt flows, and us

Verbatim from my Pa (Dr. John Van Couvering, geologist and all-around Scientist), who wrote this up for a his local neighborhood newsletter in upstate New York

Most of us have heard of the great extinction that wiped the dinosaurs off the face of the earth.  Of course this didn’t really happen, because birds – aha! -- are dinosaurs. But few may be aware that the human presence on the planet follows, not one, but four narrow escapes from extinction, going all the way back before dinosaurs even evolved. We are rather like Fearless Fosdick (Li’l Abner’s comic hero) who pops up unscathed, with the lid on his head, from his hiding place in a trash can that has been riddled to shreds by the mob’s tommy guns. The crooks are gobsmacked. “Fosdick! We shot ya fulla holes!” “Took a bit of dodging,” he jauntily replies.

What did we dodge? In fact, those “extinction events” are deadly beyond belief: moments in planetary history when nearly everything on land dies, as well a great deal of marine life. In the last of these, 65 million years ago, four entire orders of dinosaurs were exterminated, plus all but the airborne varieties of a fifth order, as well as two out of three orders of coral and entire groups of other marine life. To put the extinction of an order in perspective, extinguishing the modern Carnivora would require that every single living animal in that order -- every last cat, lion, tiger, leopard, cheetah, puma, jaguar, hyaena, aardvark, civet, dog, wolf, fox, bear, panda, seal, sea lion, walrus, weasel, mink, skunk, badger, wolverine, otter, raccoon, kinkajou, coati mundi and meerkat – would have to perish. If even one lonely pair of any of these species survived to reproduce, so would the Order Carnivora. The magnitude of disaster that can totally erase not one but a dozen orders is difficult to comprehend.

Huge meteorite impacts have been found to coincide with some major extinction events, but they may have been just the last straw. For one thing, giant impacts are turning out to be surprisingly common, with little or no extinction effect. While the asteroid that punched into Yucatan may have finished off Tyrannosaurus and the tetracorals, they were apparently sick and dying already. A few million years later, the equally large flying rock that blew a 57-mile wide hole into the bedrock under what is now Delaware and Maryland had no effect on earthly life whatever. Not only that, no impact has been found to coincide with the worst extinction of all, the Permo-Triassic event 250 million years ago, that killed off all but one species of non-amphibian land animals – which species, of course, is our ancestor. The real cause of extinction events appears to be flood basalts – extraordinary volcanic episides that go on for a million years as flow after flow of lava wells from cracks in the continent to build up country-sized mounds called “traps”. It is probable, but not yet proven, that the gases emitted from the thousand of cubic kilometers of flood basalts create toxic pollution of land and sea. The Deccan Traps of central India mark the end of the dinosaurs, while the traps that underlie the steppes of Siberia are the same age as the great Permo-Triassic extinction.

Given the odds, it is to be expected that mass extinctions have chancy, even unfair consequences. After the Permo-Triassic massacre our ancestral line succesively developed into anapsids (turtles), synapsids (mammals) and diapsids (everything else). Each of these is a higher stage of evolution, with a basic physiology better adapted to life on land. By some freak chance, a single lowly synapsid got through the 190-my extinction at the end of theTriassic, along with a variety of diapsids.on their way to crocodiles, lizards, and all kinds of dinosaurs. This primitive synapsid line, which seems to have favored bugs and fruit, quietly developed warm-blooded circulation, milk glands and eventually a uterus. And when the grand house of the dinosaurs came crashing down, guess who was hiding under the ruins?

Our final lucky break came around 48 million years ago, when a solitary species of primitive primate got stranded on the (then) island of Africa, never to see its many relatives in North America and Asia again. Bingo, Columbia River basalts. This was not one of the great extinction events, but it was enough to kill every primate in the world – except for that forgotten waif in Africa. Now, many lemurs and lorises and monkeys and apes later, here we are, self-importantly going on about our human-caused extinction event. If we knew how far we have yet to go, perhaps we just might give it up.

Friday, July 16, 2010

Apache Cassandra: materialized queries (one table per query please)

Max Grinev has done an excellent job giving a quick overview of Cassandra here, and then has a followup blog post that discusses how to accomplish various SQL-style operations such as select, join, and group by.  Very helpful, thanks, Max!

Note that in every example, you don't write a query against the existing data.  Instead, you create a new column family to represent the query.  It's as if you write your query as a data structure.   It's basically a materialized view.   Max argues that this is OK for most applications: "However, typically in Web applications and enterprise OLTP applications queries are well known in advance, few in number, and do not change often."

OK, fair enough, but it is definitely something to keep in mind: if you want to support ad-hoc queries in your application, then Cassandra is probably not the right choice.

It also looks like it is your job as a user of Cassandra to update all your various views when data changes.   I agree with Maxim that denormalization and "push on change" rather than "pull on demand" is probably the right approach for highly intertwined systems like Twitter and Facebook.  But it would be nice if the system helped you maintain consistency across all these denormalized copies.

For example, CouchDB has a very similar model of materialized views, but their views are defined as map/reduce operations on the primary document(s), and CouchDB takes care of keeping the views in sync as you update your data.  That saves me a lot of time and worry.

Perhaps something like that exists in Cassandra and I don't know about it?

Anyway, I've been very interested in Cassandra, and Max's blogs were quite helpful in  helping me get my head around it.  I highly recommend reading them if you'd like to learn more about Cassandra, or what a distributed column-oriented key/value store looks like (Cassandra is modeled after Google's BigTable).

Friday, May 14, 2010

Facebook's deal with the Devil



I wanted to share what I thought was a very enlightening blog post by Danah Boyd on the recent ruckus around Facebook and her issues with how Facebook is handling all this.

I think she particularly nailed what's bothering me here:
I’d be a whole lot less pissed off if people had to opt-in in December. Or if they could’ve retained the right to keep their friends lists, affiliations, interests, likes, and other content as private as they had when they first opted into Facebook. Slowly disintegrating the social context without choice isn’t consent; it’s trickery.
It's clear that Facebook really really really wants to make all the content people are posting as public as possible, because they are not making money on ads.

I can just see all these marketers from Big Companies in Big Conversations with Facebook execs in closed rooms waving wads of cash at them saying "Boy do we want this content! People are telling us exactly what they like and what they care about! Just make this content available, and we will give you so much money you'll be swimming in it."

In the other room are the board members saying "so, when are you going to monetize this big thing you've got going?"

So, you take the Devil's deal, what's a little privacy, I mean, people want to be transparent anyway. Rationalize and justify, I know that path. So, hoping nobody's watching, and putting on a show of "choice" and "options", you open up the content, and the guys with the money cheer.

If it were me, and I were at Facebook, I'd have a bad taste in my mouth right now...

Thursday, May 06, 2010

Time to tie my camel to the tree

I like to not worry too much about security, and try to feel relaxed about living my life on the planet. But I also know that you need to take necessary care. As the old saying goes "trust in God, but tie your camel to the tree."

I read today in the New York Times that hackers have made 1.5 million Facebook user credentials for sale.

I also got a phone call today from someone saying they were extending my Popular Science subscription, and confirmed my address and name. Then they said if I could just give them my credit card number they could go ahead and extend my subscription.

When I refused, they were OK, but asked for my email so they could send me an email confirmation. When I refused again, they wanted to have me talk to their "manager" and I just hung up. This is the second time I've gotten a call like that. This isn't just random identity theft - this is a business. And I suspect a very profitable one.

I also was reading about online payment services, and how PayPal is targetted for attacks on a regular basis.

And I have a feeling it's only going to get worse - the hackers are on the attack, and from what I know about online sites, they try to be secure (maybe) but with dynamic client-side code and phishing and Trojan horses and worms and insider attacks and just dumb human error, the weaknesses will be found and taken advantage of, more and more and more.

I can't protect myself from all of this, but I do need to take necessary precautions. We already have a shredder and shred anything that has our personal information on it. But I've been lazy about my passwords. I've kept the same one for years; it's short, and it's everywhere. Well, that changed tonight. The key passwords have all been changed, and I'll do more as I bump into them.

Symantec has me change my password every three months. I'll be using that as a trigger for me to change all my online passwords.

I've had enough, it's time to tie my camel to the tree.

Wednesday, May 05, 2010

Diagrammr - very cool diagramming tool

I just want to do a shout out for a very nice diagramming tool I bumped into via a referral from my Twitter stream.

http://www.diagrammr.com lets you write simple English phrases that describe either relationships between entities/objects/concepts or interactions between them.  For each sentence, the tool creates two nodes (subject and object) and a path between them (either a relationship or an interaction).  You can easily share this, collaborate, or embed it, as below (just right-click on diagram and copy the image URL, and paste it into an img tag)



I can imagine even doing this live during a design walkthrough or presentation, to help make the learning more dynamic.

The web site doesn't say who the author is, but kudos to whoever they are, very creative and very simple.  Thanks!

Bay Area casual carpool - an experiment in market dynamics

For the last year I have been using the "casual carpool" that has evolved on its own here in the Bay Area. It costs $4.00 to cross the bridge, and $3.50 to take BART into San Francisco. The bridge is also always jammed.

However, if you have a carpool of three people, you get a special lane through the bridge tolls, and there is no charge. As a natural law of the free market, the casual carpool developed. There are agreed upon locations, and there is a line of people and a line of cars. Two people get into a car, and together you go across the bridge for no charge and faster than both BART or driving alone. Perfect!

Now CalTrans has thrown a wrench into a beautifully running machine: as of June they will be charging $2.50 for carpools to go across the bridge. Ugh. They could at least have made it $3 so it would be easy to split amongst three people.

So what happens? Do people stop doing casual carpool? If not, do riders pay, or does the driver eat the cost? Do we have competition in the lines, with people jockeying for the front of the line if they're willing to pay more to get into the car? What happens when there is a line of cars and no people? Do cars willing to take riders for no charge get precedence? Do they stick a sign out of their window saying "will take riders no charge?" Does each car have a sign on the window saying how much you have to pay to ride? What happens if one rider pays and the next rider to get in refuses to pay?

As a driver, I think I would just take whatever people are willing to pay me.

As a rider, I would pay what I feel is my fair share and also an amount to maintain good will - I value good will a lot when riding in someone's car :). So I'll probably be prepared to pay $1, regardless of what the other rider pays. I am not interested in haggling with either the driver or the other rider - this just creates unnecessary tension, which is not worth it for me in my daily commute.

How this will ultimately evolve, I'm not sure, but it will be a very interesting experiment in market forces.

Friday, April 16, 2010

In code, a rose by any other name smells less sweet


I regularly get jibed by coworkers for having naming discussions. When my daughter was hanging out at the office the other day and sitting in the back of one of my meetings, she snorted trying not to laugh as we kept going back and forth on how to name something. Just yesterday someone forwarded me Juliet's quote "What's in a name? A rose by any other name would smell as sweet." Very true, very true.

But in code, there is a lot in a name. A different name does not smell as sweet.

The name you give something has a lot of power. In many spiritual traditions, great importance is given to words. Kashmir Shaivism holds that the world is created through the power inherent in syllables and words - matrika shakti. Syllables actually concentrate the universal energy and create form, like vibrations of sound forming shapes in water or grains of sand.



I find this to be particularly true in code. The name you give something gives it a certain power. If you name it well, the role and intent of the class becomes crystal clear. It helps you refine what the class is doing, and strip away what it shouldn't be doing. If it is poorly named, often the code in the class is unclear in its design and role as well.

A great example of fuzzy naming is the famous "Utils" class. You might as well call this class "DumpingGround" because that's what it becomes. Same regularly goes for FooManager, FooHelper, and so on. It doesn't give any real idea what it's for, so it becomes a "whatever" class. Hm, I have a method I want to write, and I don't really know how it fits into the overall design, but hey, here is this Utils class, I'll just use that!

Names also become short-hand for well understood design patterns. If you name something FooBuilder, FooProxy, FooObserver, and so on, then readers expect it to generally follow the pattern. Misusing these well-understood suffixes is asking for trouble.

I often find myself renaming classes over and over again as I work on a piece of code. Each time it helps me refine my understanding. It's like the class and its role in life is slowly taking form out of the primordial ooze, until it stands there clear and strong, with a well-defined shape and purpose.

So laugh and snort all you want, ye plebians. Shakespeare wrote beautiful, beautiful prose, and I am sure he took great care to find the right words. He even invented many words we continue to use to this day. To me writing good, clear code is like the craft of prose - there is a lot to a name, and finding the right name is both an effort and a joy.

Friday, April 09, 2010

Deploying a Google Web Toolkit app to CouchDB using couchapp

I've been working on a side project using Google Web Toolkit (GWT), and I'm using CouchDB as the backend store.

There is a project called couchapp which makes it very easy to write an HTML/Javascript-only application (you know, the new cewl way to build apps), and store it directly in CouchDB.

Huh? What are the benefits of doing this?

Well, offline support is the big one for me. If I install a CouchDB instance on my local machine, and then set up replication, my app and its data can automatically be replicated to my local CouchDB. I lose my connection, that's fine, everything I need is there. I get back online, my changes are automatically replicated back to my CouchDB in the cloud.

It also makes it easy to share apps - just publish to a common 'app hub' and anyone who wants to can subscribe and get the app. If you're feeling kumbaya, your app can be read/write, so that app users not only modify the content, they can also modify your app itself and replicate your changes back up.

One more thing: with my app deployed to CouchDB, I don't need to write server-side code. None. No PHP, servlets, JAXB, XML, MySQL, blah blah blah. I can directly access CouchDB (and other web services like Facebook, Twitter, etc.) from my client-side JavaScript. To me that's a much simpler, cleaner environment.

But the thing is, couchapp by itself just supports HTML, JavaScript, and attachments. After much soul searching and feeling un-hip, I have finally come to terms with the fact that I am comfortable and happy in Java. I could learn JavaScript, but I think and breathe Java, it's what comes naturally to my fingers. But Java isn't native to the browser, and that's not the way the wind is blowing. So recently I took another look at Google Web Toolkit, and I liked what I found:
  • Dynamic client-side browser-native apps without having to write a lot of HTML/Javascript. Yeah, I like that.
  • Work in Eclipse, a comfortable place for me to be
  • Excellent integration with CSS, allowing easy styling of widgets by applying style settings your GWT Java widgets
  • Static typing. I actually like that, I have much more confidence that when I run it works.
  • Very easy code/run/debug cycle, in Eclipse
  • No need to deploy, just refresh your browser and your new changes are in
  • Excellent support for AJAX-style HTTP requests
  • Lots of widgets, and some very cool libraries, including a visualization library
  • Open source under a flexible license
  • Strong community
Basically, in the GWT world, JavaScript is like your bytecode - you really don't need to look at it and in general you don't care. As with 'asm' in C, you can 'drop into JavaScript' if you need to, but really that can be kept to a minimum. I highly recommend doing the tutorial, it's one of the best I've worked with in a while.
    So, what do couchapp and GWT have to do with each other?

    Well, it turns out it's pretty darn easy to drop a GWT app into the couchapp framework, press 'couchapp push', and voila, your GWT app is deployed to a CouchDB instance.

    Take a look at the GWT StockWatcher tutorial app running out of CouchDB on cloudant. Add some pretend symbols and it simulates random fluctuations in stock price. This was all written in Java, compiled down to JavaScript, and then deployed to a CouchDB instance on cloudant with a simple one line command...

    So, here's all you need to do:
    • Get couchapp (on Linux or Mac you should be able to do sudo easy_install -U couchapp)
    • Run couchapp generate myapp to generate a directory structure for your app.
    • Remove everything under myapp/_attachments. You can keep the views if you want, or remove them.
    • Get GWT (preferably the plugin for Eclipse)
    • Create a GWT project
    • When you're done testing/debugging locally, compile your GWT project for deployment
    • Copy everything under the 'war' directory except WEB-INF to the _attachments directory of your couchapp application directory
    • Run couchapp push (set up your .couchapprc first with the right settings)
    • Go to the URL provided by couchapp push and you should see your app running beautifully