SHARE
Facebook X Pinterest WhatsApp

Azure Outage Caused by System Update Problem

Mar 20, 2009

The blackout of Microsoft’s Windows Azure servers last weekend was due to a glitch in an operating system update, the company said this week.

Microsoft’s disclosure came in a post Wednesday on the Windows Azure blog.

The roughly 22-hour service outage began last Friday evening at around 10:30 p.m. Pacific and ran into early Saturday evening.

“During a routine operating system upgrade on Friday (March 13th), the deployment service within Windows Azure began to slow down due to networking issues. This caused a large number of servers to time out and fail,” the Windows Azure team said in their post.

The servers were back up and functioning normally by Saturday evening, later blog posts said. However, finding out what caused the outage so it doesn’t happen again took a while longer.

What Microsoft’s engineers found was that, as application servers failed, they began notifying a server called the Fabric Controller. Part of the controller’s job is to recover crashed applications by moving them to other servers, but as more and more servers failed, the cascade backed up the Fabric Controller as well.

“We are addressing the network issues and we will be refining and tuning our recovery algorithm to ensure that it can handle malfunctions quickly and gracefully,” Wednesday’s post continued.

Microsoft also suggested that developers run more than one instance of their applications because those with more than one instance of their applications were less likely to fail.

Windows Azure, often shortened to simply Azure, is Microsoft’s (NASDAQ: MSFT) cloud computing environment. Azure has been available as a Community Technology Preview since it was introduced at Microsoft’s Professional Developers Conference last October. Microsoft is in the process of building datacenters worldwide to support Azure when it is released.

During last weekend’s blackout, Developers who were testing their code on Azure services received error messages informing them that their applications were “unreachable or in ‘stopped’ or ‘initializing’ states for long periods of time,” according to a statement posted during the outage.

Recommended for you...

Best Internet Security Software
Devin Partida
Mar 23, 2022
ServiceNow Enhances the Now Platform for Hybrid Work
Jeff Burt
Sep 20, 2021
11 Tips to Build Scalable Enterprise-Grade Applications
Interesting Machine Learning Applications for Small Businesses
Internet News Logo

InternetNews is a source of industry news and intelligence for IT professionals from all branches of the technology world. InternetNews focuses on helping professionals grow their knowledge base and authority in their field with the top news and trends in Software, IT Management, Networking & Communications, and Small Business.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.