Van Couvering Is Not a Verb: June 2008

Monday, June 30, 2008

Theo Schlossnagle - Traffic spikes through the heart of your app

Theo Schlossnagle's blog is a very interesting read indeed.

What isn't entirely obvious in the above graphs? These spikes happen inside 60 seconds. The idea of provisioning more servers (virtual or not) is unrealistic. Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time. This means it is about time to adjust what our systems architecture should support. The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling. At least eight times in the past month, we've experienced from 100% to 1000% sudden increases in traffic across many of our clients.

That's something to think about.

One likes to think that cloud computing is the silver bullet for scalability - use their stuff, build a system that assumes scale, and no worries, mate. But this makes it clear that there are basic operational issues that need to be kept in mind. Doing it manually may not cut it.

Amazon for example provides no tools for handling traffic increases automatically. It's your job to monitor and kick off new instances. But if this is happening in a matter of 60 seconds, you'd better be very sure you know what to do, and quickly.

Personally, I've never had to deal with this, so I have no tips for you. If I were to start up a new site, I'd go find people who do and have a long talk with them.

Friday, June 27, 2008

New DB Features coming in NetBeans 6.5

Some cool stuff coming down the pike for NetBeans 6.5 database tooling.

SQL Editor code completion - lots of you have been asking for this, and it's in, with more on the way. Note how completion brings up the columns if the SELECT statement already specifies the table name.

SQL History - You can now look at the history of all SQL you've executed, and can to text filtering and/or filter by connection URL, and then select and insert a statement.

Editable, sortable results, multiple result tabs

This isn't integrated yet, but I had to show it to you. With this you can sort the results by double-clicking on a row header (we should have done this a long time ago), modify a row, insert and delete rows, there is full pagination support, and you can optionally create a new tab for each statement that returns results, very nifty for comparing results back and forth.

This feature actually came from the SOA team under a completely different part in Sun - open source works inside as well as outside. Thanks to Ahi and Nithya and Nilesh for this!

Tired of typing in a JDBC URL?

Many of our users, and they're growing, aren't even doing Java, but when they want to connect to a database, the dialog is all JDBC-ish. We've fixed that - now you just type in the parameters and we figure out the URL for you.

It turns out this is no walk in the park, as each vendor has their own parameters and URL format. So for now it works for MySQL, PostgreSQL, Java DB, Oracle and Microsoft SQL Server. For the rest, sorry, you still need to type in the URL, until I can get to them.

BTW, if you still want to see and edit the URL, you can, and we'll update the fields automatically. Nifty.

Tuesday, June 24, 2008

Jerry Seinfeld on George Carlin’s Life and Comedy

I loved George Carlin as a kid. I once thought of a great joke - what is the worst ice cream flavor you ever heard of. I went to a Carlin concert, and he had the same joke, with the best answer: "bologna swirl." Carlin already did it.

http://www.nytimes.com/2008/06/24/opinion/24seinfeld.html?em&ex=1214452800&en=33d98b4f11507cbf&ei=5087%0A

Friday, June 20, 2008

Query languages for Hadoop, AND tooling for the non-relational stores

Tom White just blogged about three, count them three, different query languages being built for Hadoop.

Pig, from Yahoo! and now incubating at Apache, has an imperative language called Pig Latin for performing operations on large data files.
Jaql, from IBM and soon to be open sourced, is a declarative query language for JSON data.
Hive, from Facebook and soon to become a Hadoop contrib module, is a data warehouse system with a declarative query language that is a hybrid of SQL and Hadoop streaming.

As someone working on database tooling, I have been thinking about what we at NetBeans might do to make it easier for developers to build apps against these new platforms/products like Hadoop and CouchDB and Amazon services.

These query languages are an example - I can see an interactive query editor that lets you write and test queries directly against your Hadoop or CouchDB engine, much as you can today with SQL tools.

Another interesting question is - if I write my application in terms of domain objects, what things can a tool do to make it easy to map these domain objects to an underlying non-structured store like Hadoop or CouchDB or SimpleDB.

Definitely some opportunities here to make developers' lives easier.

Thursday, June 19, 2008

Kaj Arnö - Happily using NetBeans to build MySQL

After getting a demo from a NetBeans engineer on the C/C++ extensions to NetBeans, Kaj was converted - "now this was enough even for a sceptic like me to become eager."

http://blogs.mysql.com/kaj/2008/06/18/netbeans-as-ide-for-developing-mysql-itself/

Alex Bunardzic » Software Development Detox Part 5: Session

Someone who is in my boat about session. Although I still wonder if you are overburdening the client if you say "on the web each request contains all the information necessary for the server code to make a decision on what to do next."

http://jooto.com/blog/index.php/2008/06/20/software-development-detox-part-5-session/

Wednesday, June 18, 2008

Alex Bunardzic » World Wide Web is About Self-Serve

Interesting blog challenging us to question if there is a need for services, when really the web is basically about resources. I think the real service is in quality of service, making sure the resource is available when you need it.

http://jooto.com/blog/index.php/2008/06/19/world-wide-web-is-about-self-serve/

Monday, June 16, 2008

AOL Radio

I have been a long-time member of the "AOL sucks" club. I signed my name in blood as a member of this club when I tried for three days to get Internet access working for my laptop in someone's home that had AOL as the Internet provider. AOL has historically provided a horrific user interface which constantly thinks it knows better than you and gets in your way whenever you want to do anything different from the standard.

But I've found one piece of AOL that is just, what can I say, nice... It's AOL Radio, which I bumped into when reading a NY Times article about how AOL Radio will soon be available on the iPhone. I tried the Acoustic Rock station, and it's really quite high quality, and the user interface, is, well, simple and - gasp - easy to use!

I have tried a number of interfaces for free Internet radio, and they are either impossible to use, have a horrible selection, or are filled with ads. AOL Radio has none of these limitations. And all you need is Flash to get going. They do seem to have song-skipping disabled, but that's supposed to come back soon.

Pretty nice, and from such an unexpected source! :)

Friday, June 13, 2008

Facebook's Thrift in Apache Incubator

From the Facebook blog:

Thrift is a lightweight software framework for enabling communication between programs written in different programming languages, running on different computers, or both.

It includes support for C, Java, C++, Ruby, Erlang, Perl, Haskell and many others.

It sounds very interesting. I first heard of Thrift because it's used by a very interesting Amazon-like stack called Thrudb. What's nice is because it works with Thrift, you can use it from any language.

However, I looked at the interface for calling the Thrift APIs, and it's a bit ugly, or at least it takes some getting used to.

Anyway, something to keep an eye on, it may have legs.

Thursday, June 12, 2008

Rhapsody on my Mac

I really like the Rhapsody music service. I have been trying various ways to get this to work on my Mac.

There is a plugin for the browser that lets you access Rhapsody natively on the Mac. The only problem is that it regularly crashes Firefox, and Rhapsody support clearly has fixing this as a very low priority. I tried Safari, and that was working OK, until I upgraded to OSX 10.5.3, and now Safari regularly crashes my entire machine! Sorry, no go.

So I finally gave in, and am happily running Rhapsody natively on my Win XP VMWare instance. I reduced the memory on the instance to 512MB otherwise if I have two instances of NetBeans up my system memory starts thrashing. But once I did that I am happy as a clam. Mozart is lilting to me as I code, and all is well with the world.

Tuesday, June 10, 2008

Open Source EC2 - the beginning of a Scale Stack

In my last post I mentioned that what I wanted to see was the industry coalesce around an open source "scale stack" [1].

We may be seeing the beginnings of this. The High Scalability site (great blog, by the way) recently posted a blog talking about a new kid in town coming out of UC Santa Barbara called Eucalyptus . They are providing an open source implementation an elastic compute infrastructure that is interface-compatible with Amazon's EC2 which you can take and deploy on your own hardware.

This is very encouraging, and I think is a smart approach. Rather than try to build some standard that is lost in committee for years, use the de-facto standard, which in this space is Amazon.

Another piece of the puzzle is Hadoop, an open source implementation of map/reduce. Hadoop also has a distributed file system - one thing that might be worth investigating is building an S3 layer on top of Hadoop's file system.

What about the queuing service? Well, one possibility is to but an SQS API on top of ActiveMQ or OpenJMS.

Throw in CouchDB, and you're starting to get a very interesting stack indeed. I'm not sure about putting a SimpleDB interface on top of this - CouchDB is pretty darn interesting in its own right, and I think the jury is still out on SimpleDB.

[1] I am not sure if he wants me to mention his name, so I won't, but I want to acknowledge that the idea for an open source stack based on Amazon's APIs is not my own, but comes from a colleague at Sun. I think it's a great idea, and may it come to fruition.

Thursday, June 05, 2008

The exponential cost of contention

I enjoy Nati Shalom's blog, although it always has that taste of having the agenda of pushing Gigaspace's solutions. But putting that aside his posts are always well thought-out and well written.

I think his latest blog on the Economies of Non-Scale really drive some points home about scalability, or more to the point, the cost of non-linear scalability.

If 90% of our application is free of contention, and only 10% is spent on a shared resources, we will need to grow our compute resources by a factor of 100 to scale by a factor of 10! Another important thing to note is that 10x, in this case, is the limit of our ability to scale, even if more resources are added.

...

1. The cost of non-linearly scalable applications grows exponentially with the demand for more scale.

2. Non-linearly scalable applications have an absolute limit of scalability. According to Amdhal's Law, with 10% contention, the maximum scaling limit is 10. With 40% contention, our maximum scaling limit is 2.5 - no matter how many hardware resources we will throw at the problem

That's something to chew on. These are real costs, both to your business, to your users, and to the environment. Even if you only have a teeny 10% contention in your system, that 10% will nail you faster than you can say ACID semantics. And as has become very clear to me the final breaking point, the final point of contention in any traditional web application architecture, is the database. Get rid of that and you're home free.

How do you do that? Well, there are a lot of people trying to solve this problem with things like space-based architectures, eventual consistency, distributed map/reduce and Stonebraker's H-Store architecture. Anything to let each instance stand on its own and not have to serialize with the rest of the system at any point, in any form.

Some people argue that scalability is so hard that you shouldn't think about it until you need to. But I really believe that if you do enough to educate yourself and make some wise choices, you will be very glad you did.

What I'd like to see is the industry coalescing around some best practices, an open source "scale stack" ala LAMP, tools, community, and hosting environments like Amazon, that allow developers to easily build applications that will scale from the get go. That's where I want to see things go. That way you don't have to throw your hands up and hope for the best. Because as you can see, the costs can be deadly.

Something to think about...