Friday, February 29, 2008

Good signs from Monty

Monty just posted a blog about the status of Maria and his proposal to have an architectural meeting at the MySQL Conference. Then he said something at the end that I thought was pretty promising. I hope he can make this fly!

I hope that within Sun we will get resources to change our current polices, priorities and in some ways the whole engineering organization to make the development model much friendlier to outside participants. It should be as easy for an outsider to get a patch into the MySQL server as someone working for MySQL. This is one of the things I would like to spend my time on inside of Sun!

You go, Monty!

Thursday, February 28, 2008

Amazon, Hadoop and the New York Times

I just read this interesting entry on the Amazon Web Services blog talking about the combination of Amazon EC2 and Hadoop. I was mildly interested but then this quote caught my eye.
Everyday, I hear new stories about running Hadoop on EC2. For example, The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 1.1 million finished PDFs in the space of 24 hours at a computation cost of just $240.
Pretty breathtaking...

Monday, February 25, 2008

Short-sightedness and the iPhone

My wife and I have this deal that when I get a bonus (thanks, Sun!), we split half of it between us and the other half goes into the Money Pit.

So I had this $$ burning a hole in my pocket, and I have just finished a Big Push to get some MySQL tooling into NetBeans (a topic for another blog). I deserved a treat. I also felt my patriotic duty to inject Consumer Confidence and Cash into the economy.

I've been wanting to switch wireless services, I really find that I want to use the Internet, and Verizon charges an arm and a leg for its data service. AT&T is downright reasonable. But of course you can't take your phone with you (that's just horrific, IMHO), and so my Treo is tethered to Verizon. I thought I had bought that Treo, but I realize now I was only renting it, it actually belongs to Verizon.

Meanwhile, iPhones are popping up all around me, and I interviewed many users, a number of them quite picky and techie, like me. And they all love them. Then I hesitated because it didn't have GPS. Well, that's been solved, sort of, with its cell tower triangulation.

A friend convinced me to take a look at the Blackberry. It has all the bells and whistles, and even has video recording and real GPS. So I went into the AT&T store and started looking at it. And this was where things started to break down. I just couldn't figure it out, and the print was so tiny. I'm getting older: gray hair, bad back and bad eyes. I could not for the life of me even read what the applications were; I had to wear my reading glasses just to use the phone.

Then I tried the browser and it was O So Sad. All the images crammed into a tiny screen. It was like living on the 7 and 1/2th floor.

Then I ambled over to the sleek iPhone, like a rabbit into a fox's den. Come Closer it cooed. Feel my sleek lines. Press my buttons. Try the Two-Fingered Zoom Gesture. Look at my Browser. Come to me...

Well, when I could see real web pages, and zoom in so I could actually read them, and the numbers and letters are actually readable for me without putting on my glasses, I was finally sold. A phone for Old People.

I feel a little dirty, like I have just made a Faustian deal with Mr. Jobs, but I do have to admit I love this new iPhone.

Friday, February 22, 2008

Jon Udell makes a good case for a simple db-to-web tool

Jon Udell makes a very good case for providing functionality that makes a database very easily web-accessible. The NetBeans DB-to-REST tool is a good example of this, and is a very good start.

Tuesday, February 12, 2008

A one week hellish trip to nowhere

A family acquaintance, whose name I won't mention for reasons that will be obvious, sent me this missive.

Ye gods, what a path through Purgatory... It sounds like one of my nightmares where I get lost on trains and planes and can't even remember where I was going in the first place (metaphor for life? nah...)

All this time he's been sending me email, and I'm trying to figure out how he's up so late at night in India. Now I realize he was doing his email from hotels, airport benches and who knows where else. Sigh...

Another family acquaintance responded with his normal humorous form "and I was just fighting freeway traffic!"

It's snowing in New York. How do I know? Because I'm in New York, instead of New Delhi, India. I wish it were a long story, but it was only a long plane ride.

After leaving London on a 7:30 am flight after not sleeping the night before (the taxi came at 4:30 am), I arrived in Helsinki at noon, then hung around the tidy airport for an hour until called to queue for my flight to New Delhi. I handed over my boarding card and passport, and watched with half a mind (my mode of concentration up to this
point) until the Finn Air employee told me that I didn't seem to have a visa. "Oh, I don't need a visa for India," I said.

Some clickety-clacks on the computer and a phone call later, I was told, with ruthless Scandinavian efficiency, that indeed I did need a visa, and while the plane was going to New Delhi, I was not. Americans, it seems, are not welcome without reservation the world over. Americans who can't be bothered to check visa requirements doubly so...

It was Saturday, all offices closed until Monday, usual 10-day wait for visas. For a microsecond I toyed for with the idea of a wintertime vacation in Helsinki, but then looking out the window at the 2;30pm sunset, I decided to go home.

That's when I began to get very acquainted with the high cost of bad planning. From 2:30 pm Helsinki time on Sunday, this was my path.

1. Exit (forlornly) the Helsinki departure lounge.

2. Haggle with Finn Air to get a ticket back to London. They eventually gave me my return leg from Helsinki without charge.

3. Back through security

4. 6 pm flight to London, arrive 8-ish. Wait an hour for my bag, which with ruthless Scandinavian efficiency Finn Air told me I had to check (1 kg. overweight)

5. Hie myself to American Airlines, get there just before the ticket desk closed. Of course I had just missed the last flight, but could take another flight the next day ($200 change fee).

6. A night of purgatory at the Comfort Inn Heathrow, only 1/2 hour away and "cheap" at 75 pounds ($150). Hot water worked fine, but no cold water. No washcloth either, but I managed with the corner of a towel and a decent time to let the water cool. I could have gone into London, but that's a long trip and friend Peter was away at a
conference and I'd thrown his keys through the mail slot. And I was ashamed.

7. Back to Heathrow and "has your luggage been with you at all times?" (Yes, for the last week.)

8. Arrive New York Sunday night.

During this time I told just one person what had happened, the guy I was supposed to meet the next morning in New Delhi.

No harm done; the schmoozing that I was to do in India has been re- scheduled back here in the States, but I feel like a fool. Since then, I've told a few other people, citing "family reasons."

Hotel Facebook: you just can't leave

This New York Times article about how impossible it is to remove your data from Facebook underscores the need for data property rights.

McCain: Bomb, bomb, bomb Iran

Ouch. I'm sure some of it's taken out of context, but my goodness. If you haven't seen the original video "Yes We Can," you might want to see it first...

New York Times - Berkeley At its Best (Worst?)

Sigh... Good ol' Berkeley. My home town and alma mater. Sniff. By the way, I like the way patchouli smells, and I like being hip. :) And anyway, we have the best coffee, bar none. So there.

Rich Green: A vision for virtual machines, development and cloud computing

I think Rich nails the long-term vision, how you can have development "stacks" ready to install on your dev machine, and deploy "stacks" ready to push to the hosted environment. Note Amazon EC2 already does image-based hosting... a brave new world...

Sun now has its own desktop VM product - VirtualBox

I just read the news that Sun is buying innotek, maker of the open source desktop virtual machine product, VirtualBox. I used VirtualBox before and it was nice, but since it seemed like a small concern and it was not clear what it's long-term prospects would be, I moved over to VMWare.

But now it's powered by Sun, as a fan of open source over proprietary source, as a Solaris user (and as a Sun employee :)), it's time to take another look at VirtualBox.

Monday, February 11, 2008

Fake Steve on Google, "the cloud" and Microsoft: funny yet serious

Fake Steve has a great conversation with Squirrel Boy, and as much as I laughed, you know, I think he's really nailed it. Although of course Sun will always stay independent and be one of the big boys :)

Sunday, February 10, 2008

Data Property Rights, Not Portability - GigaOM

I think Nitin nails this one - there is an ongoing attitude towards personal data in the networked world which has continued to bother me, and "the right to leave" only scratches the surface of the issues.

Thirteen unlucky steps for installing MySQL 5 on Windows Vista

I haven't used Vista, and this doesn't motivate me to do so... What a mess!,195569,195569#msg-195569

Friday, February 08, 2008

With Amazon SimpleDB, the world's a string, and I'm feeling a little frayed

Everything in Amazon SimpleDB is stored as a string and compared as a string. The consequences of this are explored in some detail in the new article on Amazon's pages.

What's the reason for doing this? In another article the rationale is explained:
This provides application designers with the flexibility of not predefining different data types for their attributes, but rather changing them dynamically, as the application requires. A good example of why this flexibility is useful is “1984” – should it be treated as an integer, a title of a book, or a date? What happens if the application decides to store it as an integer, only to later realize that it was meant to be the title of a book and should be treated as a string? Amazon SimpleDB provides the flexibility of storing all data in one format, allowing developers to make data type decisions in the application layer without the data store enforcing constraints.
Some of the consequences are pretty stunning, especially when you're working with numbers.
The first step for representing the number ranges is to ensure that every number in the dataset is positive. This can be easily achieved by choosing an offset, such that it is larger than the module of the smallest expected negative number in your dataset. For example, if the smallest expected number in your dataset is -12,000, choosing offset = 100,000 may be safe.
Then they follow with an example:
  • Original dataset: {654, -12000, 3610, 0, -23}
  • Negative number offset: 100,000
  • Dataset with offset applied: {100654, 88000, 103610, 100000, 99977}
Then there is zero-padding:

Once all the numbers in the dataset are positive, there is another step necessary to ensure that values are properly represented for lexicographical comparisons. For example, number 2 and 10, if converted directly into strings "2" and "10" will not compare properly, as "10" comes before "2" in lexicographical order. However, if we zero-pad number 2 to be represented as "02", the comparison will execute as expected. But what happens if we decide to add number 200 to the dataset at a later point? Zero-padding with a single "0" will not work anymore. Therefore, application designers may follow the next steps for zero-padding:

  1. Determine the largest possible number in your dataset. Remember, that if you are using offsetting to take care of the negative number conversions, you have to take that into account as well by ensuring that your largest possible number is determined after you have added the offset to all the values.
  2. Based on the largest possible number, determine the maximum number of digits before the decimal point that you may have in your dataset.
  3. Convert all the numbers in your dataset to the proper string representation by appending as many "0" as necessary in front of each number to ensure that the total number of characters, representing the portion of the number before the decimal point, matches the maximum number of digits determined in Step 2 above.


  • Original dataset: {14.58, -12536.791, 20071109, 655378.34, -23}
  • Negative number offset: 100,000
  • Dataset with offset applied: {100014.58, 87463.209, 20171109, 755378.34, 99977}
  • Zero-padded dataset representation: {00100014.58, 00087463.209, 20171109, 00755378.34, 00099977}
  • Original query: ['attribute' > '500']
  • Converted query: ['attribute' > '00100500']
Simple, right?

What if you don't happen to know what your maximum or minimum value will be for the lifetime of your application? Oh well, you gave it your best shot, right? What are the consequences if you guess wrong? So some queries will be inaccurate. No biggie...

I'm not religiously attached to the relational data model and the theory behind it. I can see the value in having "loosely typed" data, which is what Simple DB is doing. The argument is this way you don't have make assumptions about your data early on.

But the problem is, you actually do have to make assumptions. You have to know (or guess) how big and how small the numbers may get.

And because you are applying padding and shifting, this means you really are make a firm decision early on that the value is an integer, and really are never planning to treat it as anything else.

So the argument of flexibility doesn't seem to be holding water for me. I start wondering if it's the constraints of their architecture, and not user flexibility, that is driving this model. It's like, SimpleDB is simple because it is a simple data model (key/values stored a strings), not because it is simple to use.

I remember at Sybase, one of its selling points was that with the relational model, and with stored procedures and triggers, you can centralize your business rules in the database. This means that regardless of what applications, old or new, happen to touch your data, they can't mess up the integrity of your data. This was, and is, very popular with larger companies trying to manage hundreds of applications touching their databases built by as many developers across the globe, experienced and otherwise.

With SimpleDB, the responsibility to maintain business rules and data integrity lies with the application tier. If you have lots of applications touching your data, it seems to me it's a recipe for confusion at the least and potential data corruption at the worst.

So, if I were building an architecture on top of Simple DB, I would seriously consider putting a very solid and unencroachable layer between my application code and SimpleDB to enforce the structure and rules I need to keep the data clean and accurate.

As I continue to blog, I am hearing and learning from readers that there are many, many perspectives and requirements, and usually the answer for any architectural choice is "it depends."

So, what do you think? How would you approach these aspects of SimpleDB? What's to like, what's not to like?

James Governor: a great analyst article about Sun

I've enjoyed James' blogs over the years, but to me this is one of the most well written, insightful, and fun blogs I've read from him. It doesn't hurt that it has good things (very good things) to say about Sun.

Thursday, February 07, 2008

The House & other Arctic musings: -40 Fahrenheit

I really enjoy this blog from north of the Arctic circle. "no machine likes -40 and you can almost hear them scream as parts snap or shake off of their mounts." Yeesh!

Some rave reviews for Apache Derby

Someone just pointed me to this page reviewing Derby on

Some pretty positive reviews! Here are some treasure quotes. Note the range from embedded to client/server, with some very heavy data loads. The amazing thing is that these are some heavy users, and the derby-user list is very quiet, maybe 10 emails a day. It Just Works (TM).

The sign of a good embedded database system is not who’s running it, but rather who doesn’t know they’re running it! The ideal database is one that sits there in the background, working hard, and never raising a peep nor complaining. That’s Apache Derby!

Our software has been deployed by tens of thousands of organizations ranging from small schools all the way up to prestigious academic organizations and corporate banks. Although we offer the option to choose other external databases, the majority stick with Apache Derby as the embedded default. Many PaperCut NG installations handle thousands of print transactions a day and host datasets in the millions of rows. Apache Derby has proven that an embedded zero-administration database can be robust and scalable in a range of environments and operating systems.

I'm currently using derby for a distributed client architecture. I've tested with 3~5 million records that need to be keep in sync through all the clients. Derby has worked beautifully, selects, inserts all very fast. And I've even seen several searches with the same indexes run faster on Derby than Oracle.

We use derby in our search engine project which is in beta stage. Having tried alternatives (mysql , postgre , oralce ) derby provieded us unmatched performence over large datasets our current dataset is around 30GB for each server we read more than a million rows at a time and insert around 300 records per second we are confident that derby will not let us down along the way. we are planning to hit over 100GB mark for both of our servers during the next couple of months

Tuesday, February 05, 2008

Get paid for improving NetBeans

I finally tuned into this new NetBeans program. If you'd like to make some money and you love to code, you can participate in the NetBeans Innovators Grants. See the page for details, but here's the money quote:
The NetBeans Innovators Grant is a process to provide grants to developers or teams of developers to work on an open source project. A total of 10 large projects will be chosen and awarded a grant of US$ 11,500 dollars. Another 10 smaller projects will be chosen and awarded a grant of US$ 2,000 dollars. Awards will only be awarded upon actual project completion. Projects that excel may receive one of two possible gold awards of US$ 11,000 dollars or two possible silver awards of US$ 5,000 dollars.

Submission deadline is March 3.

This Wiki page shows some ideas for projects already listed (I put in my plug for some database features like SQL command history (small) and database schema refactoring (large)).

Derby documentation quick search in Firefox - Knut Anders Hatlen's Weblog

Knut Anders has an excellent tip on how to make it easy to search Derby documentation using Firefox. Thanks, Knut!

Marion Nestle » Healthy people are too expensive for society?

I love this quote:
... obese people and smokers cost less to treat. Of course they do. They die sooner. Healthy people are expensive ... Economists have an interesting way of looking at things

Using the pre-installed SXDE VMWare image on MacOSX

As I mentioned in a previous post, you can download SXDE 01/08 as a VMWare image.

The download page, however, assumes you know what to do with the files (two large zip files) once you have them.

It wasn't that clear to me, although I figured it out after some searching around. I thought I'd describe them here for you.

  • Unzip the two files. This creates two separate directories, with a number of files. Copy all the files into a single directory. Your directory listing should look something like this:

    SunSXDE108-s001.vmdk SunSXDE108-s005.vmdk SunSXDE108.vmsd
    SunSXDE108-s002.vmdk SunSXDE108-s006.vmdk SunSXDE108.vmx
    SunSXDE108-s003.vmdk SunSXDE108.nvram vmware.log
    SunSXDE108-s004.vmdk SunSXDE108.vmdk

  • Edit the SUNSXDE108.vmx file and add this line:

    monitor_control.disable_longmode = 1

  • Bring up VMWare Fusion, choose "File->Open" and point it to SunSXDE108.vmx
  • Log in as "root", password "sxde"
  • When you log in, it warns you you are logging in as a privileged user, and suggests you not do this. I wholeheartedly agree. But log in anyway, so you can create a new user account. SXDE very graciously starts up the user admin GUI for you to do this

  • Log out of root and log in as the new user

  • VMWare tools don't work completely so are not installed. This requires a fix from VMWare, which I hope we'll see soon. This is a cutting-edge distribution, after all...
  • Fix the screen resolution. On my 17" MacBook Pro, much trial and error led me to pick 1680x1050 - not too small, not too large, just right.
That should do it!

Sequel, Es-Qyew-Ell, Squell

Now that Sun has purchased MySQL, there have been some internal discussions about the right pronunciation. I always said "Sequel" and "MySequel." When I was at Sybase, we called it "Sequel Server" and I hear tell that's the pronunciation Oracle uses too.

But the official pronunciation of MySQL is "My-Ess-Qyew-Ell." Ah well, I have to train my tongue out of respect for our new colleagues, even though Sybase and Oracle were there first.

This all reminds me of the time when I was at Sybase (pronounced "sea bass" in France) when I was sent onsite to a big news agency on Long Island whose client application (written in Ada of all things) kept crashing. It was a nasty timing bug that was so sensitive that print statements would make it go away. This was with fully asynchronous code where you registered call backs, and asynchronous system traps (ASTs - remember?) would call the next step when the I/O completed. Talk about fun.

After three days of head banging I was sorely tempted to tell them to just put in print statements, but I knew that wouldn't fly...

I finally found it by reading, line by line, the application code until I logically determined what the problem was, and it was fixed with a few lines of code.

Anyway, my point is, the folks at this shop called SQL "Squell". Eeeww. I always had an uncomfortable feeling when I heard it pronounced that way, like they were squashing a snail.

Then of course there is Apache Derby. And Linux. Can't we all just get along?

Solaris Express Developer Edition with NetBeans 6, AMP and Ruby

Solaris Express Developer Edition (SXDE) is a quarterly distribution of the latest and greatest Solaris stuff. The releases are named after the quarter, and the release that just came out today is called SXDE 01/08. You can download it here. Note you can get the DVD image, or if you're using VMWare like me, you can just get the VMWare image -- pretty cool!

This release is of particular interest because for the first time it includes NetBeans out of the box, in this case NetBeans 6.0.

This release also includes an integrated runtime stack for Apache-MySQL-PHP and for Ruby, along with tools to make it easy to set up and manage these stacks. Getting Apache and MySQL up and running was quite a snap!

I've been playing around with this release for a while, making sure NetBeans plays nicey nicey with the runtime stacks. There are a few hiccups, but for the most part it works quite well.

I wrote up a number of how-tos
on the NetBeans wiki site, like how to get started with using the Ruby runtime in NetBeans or how to start up the PostgreSQL service. If you're interested, get SXDE, install it (in VMWare if you're like me), and use these how-tos to help you get your feet wet.

One thing I thought was pretty cool was how easy it was to get going with debugging in PHP. You just have to enable debugging, set a breakpoint, and go.

I have been working with Solaris for a number of years, and these days the leaps and bounds they are making in improving the desktop experience is quite astounding. The installer is actually reasonable. I am also looking forward to the new package management system, IPS, that is coming out as part of Project Indiana.

So, try it out, or stay tuned, as you see fit, but this Solaris thing is definitely coming along. The report from Europe isn't so bad either... :)