Tuesday, September 18, 2007

Session State is Evil

As I mentioned in a previous post, I was responsible for architecting a clustered version of Sun's app server using HADB, a highly available, highly scalable database. Why did their clustered version need a scalable database? For one specific reason: session state.

One of the things that really sucked with the previous version of the app server was that their session state replication mechanism was completely broken. It would corrupt data or get into deadlocks, and couldn't scale beyond three or four nodes.

This was my first exposure to the demands of session state and how it impacts the back-end architecture. HADB is attractive because it, basically, never goes down, and it is transactional, so it would solve the reliability and scalability problems of the previous implementation.

The more I learned about session state, the more I disliked it. The impact to a system trying to be scalable and available was enormous.

First of all, load balancers have to be able to detect the session id and make sure that a request for an existing session was routed back to the same instance. Only certain hardware load balancers know how to do this, and it requires digging into the HTTP message to pull out a special cookie that tells you what instance owns the session.

Secondly, application servers have to be able to handle all this session state without using up all the memory of the system. So EJB invented this marvelously ugly word, "passivation", by which session state is written out to disk and dropped out of memory - in other words, a virtual memory system for session state.

And then there is the impact on the complexity and scalability of clustered systems. Because you never know when a given instance may go down, session state has to be relocatable to any other instance in the cluster. Trying to build something that does this correctly is complex, painful, error-prone, and, to be honest, annoying.

For the end users, you end up with systems that are always limited in some way or another. If you store session state in a shared database, this impacts response time, and this database is another moving part you have to administer. And it had better be highly available and scalable otherwise it can only handle so many instances -- ultimately the database itself impacts scalability.

If you use in-memory replication, then either you have to make the state available to all instances, which means lots and lots of data flying around the network, constraining scalability, or you have to pair up instances. This is dangerous, because if one instance goes down, the other instance has to handle all the workload of the existing sessions of the downed instance -- you end up with hot spots and if you're not careful, a wonderful cascading failure scenario.

There is a giant sucking sound of processing power, time, money, and intellectual resources being used up on this problem, all because someone wants to store a shopping cart in HttpSession.

All of which leads me to conclude: session state is evil.

When I read about the principles of REST, I was really taken by it. One principle that really stood out for me was: stateless. HTTP was never meant to be stateful, and for good reason. Imposing state on this protocol is a bastardization of its original intent.

Rather than share my potentially confused view of this, I'll quote Roy Fielding[1] directly from his slides on REST that he gave at RailsConf:
  • A successful response indicates (or contains) a current representation of the state of the identified resource; the resource remains hidden behind the server interface.
  • Some representations contain links to potential next application states, including direction on how to transition to those states when a transition is selected.
  • Each steady-state (Web page) embodies the current application state -- simple, visible, scalable, reliable, reusable, and cacheable network-based applications
  • All application state (not resource state) is kept on client
  • All shared state (not session state) is kept on origin server
Do you get it? I repeat: all application state is kept on the client.

So stop asking the server to "hold on to things" for you. Please. If you do that one simple thing when you build an application architecture, you have freed the server infrastructure; you have given it wings.

Today it is easier than ever before to do this. With AJAX and RIA toolkits like JavaFX (shameless plug), you can provide dynamic interaction with the user without having to ping the server all the time and have it store conversational state. You can even use Java DB or Google Gears to keep your client-side state persistent.

Because of these new web client technologies and patterns, I don't believe the concerns that saving state on the client necessarily results in more network traffic. You change your interaction with your server from lots of little requests with little bits of data to much less frequent requests with more data. In general, because of the overhead of a network request, I believe this is a Good Thing.

So, I know it's tempting to use session state. I know it seems easy at first. I know that all the server products out there give you utilities to do it and talk it up (they're even talking about letting you keep even more state around). But Just Say No. Don't Do It.

I have seen where it takes you, and it ain't pretty.


[1] I have a funny story about the first time I met Roy Fielding. I was at the Derby booth at ApacheCon, and Roy had come over to learn more about it. I didn't know him from Adam, and I asked him why he was interested. He said he was starting a new project in Apache to do document management.

I said with a big smile "welcome to Apache!". He looked at me for a second and said, "well, actually, I've been around Apache for many years..."

It was only later that I discovered that he was one of the founders and is currently a VP at Apache :)

12 comments:

dgurba said...

I think I understand your post with respect to scalability and that using session state hurts that.

And you say ... "just dont do it". My only question then is as a site developer for businesses and ecommerce sites ... are their RIA-based shopping carts out there, that would use JavaDB (for instance) to power the shopping cart?

Or are there little/no opportunities to wrest myself from using the evil session (lets not talk about cookies ... let's just not go there :P)

David Van Couvering said...

dgurba: Great question. I think because storing session state on the client is a somewhat new concept, enabled by technologies that haven't been around for long, there aren't yet a lot of high-level components out there to help you (at least that I know of).

You might want to take a look at Dojo Offline or Google Gears if you're building a web app.

If you're building a Java-based RIA, Java DB is definitely a good choice, and if you combine that with JPA (Lance Andersen has a good blog about how to do this), then you can simply make your ShoppingCart object persistent and you're done.

So I think there are opportunities, but it doesn't have the level of Big Company support that session state seems to have these days (bad companies, no biscuit).

Anonymous said...

Can you explain how I transport state (say I have my state in a big hash-table in js) when the user opens a new tab?

The main use for sessions (at least for us) is to map two different calls (http is stateless) to the same user on the server.

David Van Couvering said...

anonymous: do you want the two separate tabs to be the same session and share session state? If understand REST principles correctly, then this shouldn't be needed, and actually goes against the principles of HTTP (as you noticed, since HTTP is stateless).

Each page in your app represents a given state, and should contain all the information needed for your user to navigate to a different state.

So if for example you have one page that lists books, with URLs for different books, and then the user chooses to open one of the URLs in a different tab. The URL should contain all the data needed for that new page, and when the new tab is opened, the response from the server contains all the information needed for that new state, including URLs that let the user navigate to another state (including, for example, how to get back to your original book list, without assuming the Back button of the browser will get you there).

Another important thing to think about is -- what is application/session state, and what is a resource? For example, is a "shopping cart" session state, or is it something more persistent that the server should store in a database? Note that Amazon stores your shopping cart in its database.

I think the principle of the thing is: state should either be stored in a database on the backend (in which case it should be considered a resource which can be identified as a URI), or cached in the client (encoded in the content of the current page if possible, or stored in local client storage if needed), but it should never be kept in the app server as a "session state."

Finally, I think there are people out there with much more expertise than I who can provide guidance, so you should poke around, and I will too.

David Van Couvering said...

I want to adjust one thing I said. I'm such a database weenie that I assume resources are stored in a database. But a resource can be stored anywhere - in a file system, in durable memory, on tape, whatever. You could even have the state rolling around in a circle from machine to machine like a sushi boat :)

The only requirement is that, whenever a browser sends a request to a URI that identifies that resource, the server, without depending on any previous requests from the same session, can respond with a correct representation of that resource.

Anonymous said...

Unfortunately, leaving session state on the client facilitates the classic kinds of faking by malicious users.

I suppose you could encrypt the state or include other kinds of validation mechanisms.

gregjor said...

I agree that maintaining session state on the server can get messy. I also agree that too often the server is asked to maintain state that could just as easily stay on the client. Your article, though, takes a narrow view of session state, and I can't agree that "session state is evil."

The original state mechanism, cookies, are stored and managed on the client. Cookies give the server-side code a place to store a small amount of information without depending on the HTML page -- no need for hidden form fields or magic values in the URLs. Cookies also don't depend on Javascript. When you write "HTTP was never meant to be stateful" I counter that it was never meant to support AJAX or Google Gears, either. Cookies have been part of HTTP for a lot longer than AJAX and were invented specifically to allow for statefulness. As another person commented cookies are not secure, and limited in size and number, and so cookies aren't always appropriate for maintaining state. The same arguments go for embedding state in the HTML.

Your argument about load balancers and web servers having to know about the session state mechanism is a bit of a red herring. That problem is not inherent in session state, it comes from web server vendors implementing the state mechanism in the web server (see IIS, WebObjects, etc.). A more robust implementation is storing the session information in a shared database or file system, and only passing the key to the session data from the client to the server. Finding and restoring session state when the server receives a request is done by the server application, not in the web server itself, so it doesn't matter which load balancer or clustered web server gets the request as long as every server that can receive a request has access to the shared session storage. This can be implemented with exactly the same tools used for any shared database, so session state is no different from, say, a database of customer records. Implementing session state in the web server itself, or the load balancer, while possibly more efficient, is pretty much always a bad idea in the long run because of scalability and the other problems you point out.

In your follow-up comments you clarify your argument: "... state should either be stored in a database on the backend (in which case it should be considered a resource which can be identified as a URI), or cached in the client (encoded in the content of the current page if possible, or stored in local client storage if needed), but it should never be kept in the app server as a 'session state.'" When you put it that way I agree with you; I have seen session state abused and overused too many times.

Stev Martin Electronic Commerce Watcher said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...

Gregjor, first of all, state stored on the client is perfectly OK. Nobody objects to that. Storing state on the SERVER is the problem.

As to the "it was never meant to support AJAX or Google gear either" you are, quite simply, wrong. While those technologies were obviously not invented at the time, http was fundamentally designed to support a wide variety of applications that interact in a stateless way.

Please educate yourself about the original designs and intentions of http, and RESTful architectures before you post this sort of nonsense.

Gerald O'Steen said...

Please forgive my naivety, as web-dev is not my specialty, but are PHP sessions considered 'session state'? It sounds as though it is so. However, if that is true, isn't it basically the same as accessing a server-side resource on a filesystem? I mean, the server(or PHP, in this case) is generating a session identifier from a client's connection information and using that as a key for the resource stored on-disk(or wherever). Is that not still just as stateless as having the browser send some sort of state information directly?

It does occur to me that I could be incorrect on the manner in which PHP sessions are created. Perhaps, instead of my above assumption(which, admittedly, is probably error-prone), it is generated by the worker thread handling the client connection itself and the information persistent for the life of that thread only. In this case, would serializing the session state(via the appropriate methods) to the filesystem or a database on the server and using a cookie to store the session identifier be more or less 'evil'? Does it lose any 'good' points due to higher resource usage and thus lower performance?

David Van Couvering said...

Yes, a PHP session is "session state". And yes, often that session state is stored on disk. That's actually the problem. If you are trying to scale out, then you will have multiple systems. Unless you have a shared disk, then the other systems won't have access to that state.

And lets's say you do have a shared disk or other shared resource like a database. If you grow to a very large scale, then that database itself needs to scale out. So people start clustering their databases.

Often that is fine and works for many sites. But for some sites, even that is not enough scale. If you can avoid session state, that's a great thing.

In my experience, however, most web sites just use session state, and it's actually fine. I've gotten a little more mellow about this as I've gotten older :)