Author's note: I was thinking about this blog and how I didn't quite capture some things correctly, particularly around what it means for processing to become serialized. Then I saw some comments that repeated my own concerns. So I have done some rewriting in an attempt to explain more clearly what I think the issues are...
When examining the challenges of throughput and scalability in the brave new world of Web Scale, one solution I've been looking at is
Gigaspaces.
I recently read
their article about scalability (follow the link to "The Scalability Revolution" - you may need to register to be able to see it), and they made a claim that gave me pause, to say the least.
Basically they said that if you have a multi-tier architecture, you are doomed - sooner or later you are going to hit a scalability wall, either because of cost, complexity or latency. I think in particular they meant that if your
end-to-end request/response path must travel over multiple tiers to complete, you are doomed.
They have examples to back up this claim, but what was compelling for me was their reference to
Amdahl's Law.
Amdahl's Law states if any portion of your system requires processing to be serial instead of parallel then the system is going to hit a scalability wall where
it can go no further, no matter how much hardware you throw at it. From the Wikipedia article:
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of 1 hour cannot be parallelized, while the remaining promising portion of 19 hours (95%) can be parallelized, then regardless of how many processors we devote to a parallelized execution of this program, the minimal execution time can not be less than that critical 1 hour. Hence the speed up is limited up to 20x.
Note that the actual numbers don't matter too much. You can avoid hitting the wall by reducing the amount of serialization, but sooner or later, you are going to hit that wall. And even before you hit it, the cost of getting a fraction more speedup is going to increase exponentially. I have heard many stories to vouch for this, where getting 10% more throughput requires 100 times the hardware investment and/or complete rewrites of applications.
If your system architecture requires you to potentially wait for a shared resource within the request/response path, then you are shifting from parallel to serial execution, and you are going to hit Amdahl's Wall. Here are some examples:
- Reading/writing data in a shared database
- Sending a request over a network which will block if the network is congested (the network is shared by multiple simultaneous requests)
- Writing to a disk which may block if it is busy, where the disk is shared between multiple simultaneous requests.
Well, that does seem to throw a wrench in things. How can a service run independently, with no database access, disk I/O or network I/O, and still be useful?
Well, the key thing is that these things
can not happen within the request/response path. They can still be done in the background, asynchronously. For example, if the user updates their profile, then this update is stored in-memory and then, in the background, this status update is stored in the database. If you want to send a message to another service, the message is queued up in memory and then delivered to the external service asynchronously.
Note the consistency implications of this. You end up with a system where not everybody has a consistent view of the current state of the system. But we already know through the
CAP theorem that this is something you probably need to accept.
There is also a window of possibility where the user thinks the update succeeded but the actual persistence of this update fails or there is some kind of conflict. This means your company and your application
has to know how to apologize.
But if the benefit of this is linear scalability, these tradeoffs may be well worth it. Better to apologize to a small subset of customers about a lost order than apologize to all of them for terrible latency or to apologize to your management about the cost and time overruns for your system.
Gigaspaces claims to have accomplished this kind of architecture with a runtime environment they call a
space which provides in-memory access to all the resources you need to process a request. The underlying Gigaspaces runtime ensures that each request is routed to the right cluster node and that the node contains all the resources needed, in memory, for that object to get its job done. To be honest, I am still mystified how they can accomplish this, and this is something I would like to investigate further.
At any rate, I appreciate the Gigaspaces article helping me understand the implications of Amdahl's Law. As I evaluate a scalability architecture, if something, somewhere, requires the request/response path to potentially wait on a shared resource, then I know that sooner or later we've got trouble. It's the law.