From the ‘Everyone Makes Mistakes’ files:
Google Gmail users were hit with a 100-minute outage yesterday due to an upgrade issue.
Ben Treynor, vice president of engineering for Google Gmail, blogged that Google (NASDAQ: GOOG) took some of the Gmail servers offline on Tuesday AM for routine upgrades. It was those upgrades that led to the service disruption.
That’s right, due to miscalculation on Google’s part, an action (the upgrade), which should have provided better service, resulted in no service for tens of millions of Gmail users around the world.
“We had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response,” Treynor blogged.
In my opinion, this is a classic load balancing newbie error. Problem is Google isn’t a newbie.
Next page: How’s Google going to avoid a repeat outage?