Last Friday afternoon from about 3pm – 3:45pm Lamplight had a significant loss of availability and most customers would have been unable to access your systems. We’re sorry for the inconvenience this will have caused you.
We had a massive surge of traffic which overwhelmed our servers; this was detected automatically within a couple of minutes and new servers provisioned. However, as soon as they came online they were swamped too and very quickly became unresponsive. Further servers automatically started up, and the cycle continued. This was unlike any of our previous experience in that the rate of traffic and the load it placed on our servers was much higher that we usually see; it also continued for longer.
By examining the traffic that was coming in we realised it was coming from a single, logged in user, repeatedly refreshing the home page (up to 10 times a second). We blocked traffic from that IP address, restarted servers, and normal service resumed.
We contacted the customer in question and it turns out that one user had a book on their keyboard, resting on the ‘F5’ refresh key. Because they were logged in, the amount of work the servers had to do was much greater than a logged out user. This explained the rate of traffic and the load it placed on the servers.
In the next couple of weeks we plan to make further changes to our hosting infrastructure to detect and respond to these types of event faster and more effectively. Again, we’re sorry for the impact this will have had on you.