Translate

Friday 24 January 2014

Google apologises for 30-minute outage, explains what went wrong

Earlier today, some Google services went down for users across the world including India. Google was not only quick to rectify the problem, but has quickly apologised through its Google Plus account. Along with an apology, the Search giant has listed a series of reasons that were responsible for the outage. Google’s services like Gmail, Google+, Calendar and Documents were affected. About 10 percent of users were affected for around 25 minutes, while the problem persisted for others for as much as 30 minutes.

So, here’s what happened. An internal system that generates configurations – information that tells other systems how to behave – generated an incorrect configuration due to a software bug. The incorrect configuration was sent to live services over the next 15 minutes. This caused users’ requests for their data to be ignored, and those services, in turn, generated errors.

Users then began seeing these errors on affected services and at that time Google’s internal monitoring system alerted the Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration and began sending it, eventually subsiding the errors.

Now, that the services are in place, Google has ensured that it will take steps to avoid such issues. In the Google+ post, it said “we are putting more checks and monitors in place to ensure that this kind of problem doesn’t happen again.”

No comments:

Post a Comment