It’s been nearly two weeks since an overloaded authentication server knocked Amazon’s S3 storage platform offline for several hours, and the company has yet to provide specifics on the fixes it has promised.
Amazon’s explanation on the outage day was the same statement provided this week when InternetNews.com questioned it on service improvements.
“We are taking immediate action on the following: improving our monitoring of the proportion of authenticated requests, further increasing our authentication service capacity, and adding additional defensive measures around the authenticated calls,” Amazon stated.
[cob:Related_Articles]Amazon had also told users it has begun work on a service health dashboard but could not provide a specific delivery date as of this week.
The company launched S3
in the spring of 2006, providing a Web services interface that lets developers store and retrieve data. It currently has 330,000 registered developers.
The downtime incident occurred when one of three service locations experienced elevated levels of authenticated requests from multiple users.
At the time Amazon reported that although it monitors overall request volumes, it had not been monitoring the number of authenticated requests.
“Within a short amount of time, we began to see several other users significantly increase their volume of authenticated calls. The last of these pushed the authentication service over its maximum capacity before we could complete putting new capacity in place,” a company spokeswoman wrote to users.
According to Amazon, the burden of processing authentication requests and account validation lead to the location failure.
Amazon issued the statements on its Web Service Developer Connection blog.
As one analyst related, Amazon struggled with service-level agreements even before the failure. In fact, it was only after new players in the storage-in-the-cloud market, such as Nirvanix, emerged that Amazon debuted its SLAs, Bob Laliberte, an analyst at Enterprise Strategy Group, told InternetNews.com.
“The Amazon S3 platform has been hugely successful, especially for development environments,” Laliberte said.
“The main concern is for companies building a business leveraging this platform. For example there are some online backup providers that leverage the Amazon S3 back end, and they have to take into account how an outage will impact their business,” he said.
Yet according to Laliberte, the outage shouldn’t be taken as an indication that the storage approach isn’t viable.
“As Web 2.0 applications continue to proliferate and users leverage their services, there is the expectation that these services are a utility and are always available,” he added.
The key, Laliberte said, is that if you’re betting your business or a client’s business on storage in the cloud, you must fully understand the SLA and support available.
“Then it’s time to decide if that is an acceptable risk,” the analyst said.