RealTime IT News

Disaster Tolerant Unix: An Ounce of Prevention

Of the many lessons learned from Sept. 11, an enduring one was the need for disaster planning. From the need for fast emergency exits to vigilant security to redundant IT systems, planning ahead for the worst has gained companies' attention.

One critical area that businesses need to address is the vast amounts of data stored on their servers. For top Unix vendors like HP and Sun Microsystems , the challenge is to meet the needs of a business with the capabilities of the system. As many companies have found out, there is no simple answer for how disaster tolerant their Unix servers should be.

"I think what 9/11 did was cause people to consider they might have a need for [disaster tolerance]," says David Freund, a research analyst with Illuminata, an IT consultancy. "The next step was exploring what that actually means."

Sketching a Disaster-Risk Profile
Industry analysts and Unix vendors agree that a company must look first to its overall business continuity plan, which will dictate what kind of protection it needs for its Unix servers.

"A disaster is when my business stops running," says Dan Klein, a marketing manager in HP's business-critical systems group. "That's the real disaster and that's the disaster you want to prevent."

In that sense, sketching a disaster plan has not changed all that much, says Klein. Businesses have been taking precautions against catastrophic events for hundreds of years, but the need to protect data has steadily risen since the 1970s, when it was very important, to today, when it is, in many ways, the business itself.

"The two things you're really concerned about is data and ability to use the data," says Freund. "Not everyone needs the same disaster tolerance."

The twin considerations are known as recovery-point objective (RPO) and recovery-time objective (RTO). The RPO is what data needs to be saved, while the RTO reflects how time-sensitive the information is. A matrix, using these two objectives, can determine how business-critical data is.

"You need to make it as simple as possible," says Kevin Coyne, director of business operations for Sun's services unit. "What's the recovery time? How much downtime can you sustain? What's recovery point? What's the longest amount of time you can sustain loss of data? It becomes much easier then to develop a solution set."

Not all businesses, or even parts of businesses, are equal. For example, an investment bank has a very low disaster-tolerance level, since it needs all transactions preserved without any downtime. An air-traffic control system, on the other hand, needs all data up as soon as possible, yet does not has a critical need for historical information. Even within IT systems, needs differ. The back-office operations of a company might require a high degree of accuracy of the data, but can take some time getting back up and running, while an online store front would need to be up again immediately.

With a disaster-risk profile, a company can then make the choices on the technical details to meet their needs.

"You need to focus on the business operation, not the technology," says Klein. "You'll find different parts of an enterprise have different requirements."

Preparing For The Hundred-Year Flood or Flooded Basement?
After Sept. 11, Sun's Coyne says the need for protecting a company's data quickly moved up the line in importance. Where before one IT person would be charged with the task, some companies were forming business-continuity offices.

"Companies recognize they do have control of cost, complex and level of availability," Coyne says. "They're pricing out the cost, then determining how much they're able to spend."

The importance companies have placed on protecting their data is critical, according to Illuminata's Freund. He cites a University of Texas Research Center on Information Systems study that found nearly half of companies that lose their data in disasters go out of business, while 90 percent close within two years.

With the stakes that high, Freund says CEOs and CIOs have good reason to treat data, in many cases, as a their company's most valuable commodity.

"The cost," he adds, "is the reality check."

While companies might want to maximize both their RTO and RPO, they can quickly find the costs spiraling.

"There's typically a case of sticker shock," Freund says. "It's not inexpensive. It's a lot like buying an expensive insurance policy. And it's not all-or-nothing."

One of the first lines of defense is redundancy, in an attempt to eliminate a single point of failure that could knock out a company's data system. However, the task remains as frustrating as a game of whack-a-mole: once one single point of failure is eliminated, another pops up to take its place.

Disaster tolerance adds layers of redundancy, in order to insure that any failure would not cripple IT functions. For Unix servers, this begins with clustering or mirroring. With so-called high-availability clustering, an application would run on multiple servers, while disk mirroring can create perfect copies of data.

The World Trade Center's collapse reinforced the need for geographical dispersion of a company, including its servers. Freund says this is where the cost often comes into play.

"If you're out to increase distance, that distance doesn't come free," he points out. "The greater the distance, the longer it takes electrons to move down the cable. If it's up to the second, then it could slow down your system."

HP's Klein says protecting data is will soon become just a cost of doing business, guarding against losing the most valuable part of a company.

"The basic concept has not changed," he says. "It's what we apply it to and the time factor that's changed."