SHARE
Facebook X Pinterest WhatsApp

Don’t Blame Disks For Every Storage Failure

Written By
thumbnail
Judy Mottl
Judy Mottl
Feb 26, 2008

While disk problems are a big culprit in storage subsystem failures, enterprises might want to begin eying physical interconnects, since they’re just as often to blame.

That’s according to researchers at the University of Illinois at Urbana-Champaign and Network Appliance. The researchers — university researchers Weihang Jiang, Chongfeng Hu and Yuanyuan Zhou, and NetApp’s Arkady Kanevsky — concluded in a recent study that disks were responsible for 20 to 55 percent of failures.

But they also found that physical interconnects including shelf enclosures could claim even higher failure rates: 27 to 68 percent.

[cob:Related_Articles]”Disks are not the only component in storage systems,” wrote the study’s authors. “To connect and access disks, modern storage systems also contain many other components, including shelf enclosures, cables and host adapters, and complex software protocol stacks … Failures in these components can lead to downtime and/or data loss of the storage system.”

“Hence, in complex storage systems, component failures are very common and critical to storage system reliability,” they said.

Their findings, available in PDF format, are slated to be presented at this week’s 6th USENIX Conference on File and Storage Technologies (FAST).

The study’s authors analyzed almost five years’ worth of storage logs from 39,000 systems deployed at NetApp customer sites. Those systems include approximately 1.8 million disks, across 155,000 high-end, mid-range, low-end and backup shelf enclosures.

In addition to new statistics on the role of physical interconnects in failures, the researchers also found that protocol stacks were responsible for 5 to 10 percent of failures.

Fortunately for IT admins, the report also suggested some ways to help beat the odds.

For instance, storage subsystems tied together with redundant interconnects experienced 30 to 40 percent lower failure rates than those with a single interconnect, it said.

Additionally, spanning disks of a RAID group across multiple shelves in a system makes for a “more resilient” approach than using a single shelf, the study stated.

Other design considerations could play a role in further reducing problems.

“Storage system designers should also think about using smaller shelves, with fewer disks per shelf, but with more shelves in the system,” the report said.

The research takes a somewhat wider view of storage problems plaguing enterprise datacenters, as a good deal of recent, high-profile research about storage failures has focused primarily on disk problems.

For instance, last year at FAST ’07, Google presented its own study on failure rates (available here in PDF format) based on experiences with 100,000 of its own PATA and SATA disk drives.

The Google study found that drives one year old or less had an annual failure rate of 6 percent, and are at risk from colder temperatures — while high temperatures can lead to excessive failures in older drives.

That study also focused on the drives’ Self-Monitoring, Analysis, and Reporting Technology (SMART) and concluded that the feature — found in most drives used today — may not be up to snuff in accurately predicting disk failure. The Google research found that in 36 percent of failed drives, SMART did not flag any problems.

The authors of this year’s joint Illinois-NetApp study warned that focusing on drive-related problems can encourage enterprises to undertake unnecessary disk replacements to combat crashes, when failures can just as often be caused by other factors.

Similarly, the study also noted that low disk failure rates do not necessarily translate to a more reliable system.

Recommended for you...

Best Internet Security Software
Devin Partida
Mar 23, 2022
12 Business Funding Challenges + How To Overcome Them
How IT Investments Are Changing For Small Business
How To Choose Managed Services (MSPs) For Small Businesses
Guest Author
Nov 5, 2020
Internet News Logo

InternetNews is a source of industry news and intelligence for IT professionals from all branches of the technology world. InternetNews focuses on helping professionals grow their knowledge base and authority in their field with the top news and trends in Software, IT Management, Networking & Communications, and Small Business.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.