dcsimg
RealTime IT News

Providers Slow To Acknowledge E-mail Shortcomings

The explosive growth of high-speed Internet access has led to equally painful growing pains for e-mail users throughout the U.S.

And as the number of e-mail outages increase, it has the tech staff at broadband Internet service providers nationwide scrambling for a backup plan when its servers are out of commission.

The fate of the e-mail server is tied to the fate of its domain servers, many of which are prone to denial of service attacks and hardware failures. But providers have been reluctant to take the steps necessary to keep the servers online in the event of system shutdowns.

Hardest hit recently, and well-publicized, are the problems experienced by Verizon Communications and Road Runner, a digital subscriber line and cable Internet provider, respectively.

Verizon, an incumbent local exchange carrier (ILEC) is just coming out of its latest e-mail outage, a three-day shutdown of e-mail services that stopped the flow of e-mail for potentially hundreds of thousands of e-mail users. Services, which were restored Thursday morning, ran a little slow as the overworked e-mail servers struggled to deal with the backlog.

A spokesperson for the Baby Bell said a hiccupping router was to blame for the outage, which constricted the flow of e-mail service coming in and going out.

The ILEC has had a raft of e-mail outages since its marketing department started aggressively marketing its high-speed DSL service to the masses last year. In February, a misconfigured domain name server shut out roughly 50,000 bellatlantic.net customers.

Road Runner, on the other hand, has been struggling to keep up with the thousands of new cable Internet customers flocking to its service. In just a couple years, the company has grown from a fledgling high-speed option for a very few to a mainstream requirement for more than a million broadband junkies.

Last week, about 100,000 Texas, Louisiana and Mississsippi Road Runner customers were left stranded after a faulty server kept them from sending or receiving e-mail for a week.

The problem was attributable to its primary e-mail server going into a self-check and instructing the backup e-mail server to shut down also, said Lidia Agraz, a Road Runner spokesperson. A replacement server was brought in Wednesday, but it couldn't keep up with the volume of e-mails.

By the end of the week, Road Runner had four million e-mails in queue. The server spent the weekend playing catchup.

What are these broadband providers doing to address the rampant e-mail outage problems? What can be done to address the problem?

The answer, it seems, is as easy as it is complex to resolve.

R. Scott Perry, president of e-mail filtering company Computerized Horizons, said these ISPs are going to have to revamp their e-mail server architectures and treat them as separate entities from the rest of their networks.

"I've seen lots of poorly designed mail servers; often, people just don't know better," Perry said. "As an example, a division of AT&T that operates a cell phone system had an e-mail paging gateway. They only had one MX record, so if their mail server went down or was unreachable, mail couldn't get through. Even worse, it was on the same box as their (Domain Name System) server. They have added backup mail servers in the past few months, but still use the DNS server as a backup."

An MX record controls which mail server deals with the e-mail sent to a particular domain or hostname. For example, e-mail sent to @austin.rr.com may be routed through three e-mail servers Road Runner has set aside to accept and route e-mail for that domain.

The snarls come when too many e-mails are sent to a particular domain. If too many e-mails are sent and received by @austin.rr.com, it can bog down the three e-mail servers.

A check of some of the major domain names show that not enough e-mail servers are online to handle the amount of e-mails sent and received:

  • home.com - 4
  • verizon.com - 3
  • bellatlantic.com - 2
  • gte.com - 2
  • twcnyc.com (RR, New York City) - 2
  • houston.rr.com - 1
  • minnesotaroadrunner.com (Twin Cities) - 1.

The problems occur when the domain server shuts down, as was the case for Road Runner's self-check, or a Denial of Service attack on the network. That's where redundancy becomes an all-important issue, one that providers have, until recently, been unable to address.

Perry points out that the software and hardware exists to handle the needs of the busiest e-mail servers out there. All it takes is the wherewithal to spend the money on new equipment, backing up the network on another physical network and taking advantage of the Domain Name System to configure the networks.

"DNS allows for quite a bit of flexibility with mail servers," Perry said. "The MX records allow you to give each mail server a priority. For example, you could have a primary mail server with a priority of 10, and a backup mail server with a priority of 20. Under normal conditions, all mail would go to the primary mail server. But, if it was unavailable, mail would go to the secondary server.

"Another way that DNS can handle routing is by having multiple IP addresses for a host name," Perry continued."For example, "yahoo.com" has one primary MX record of "mx1.mail.yahoo.com". That means that any mail going to yahoo.com will go to mx1.mail.yahoo.com (unless it is down or unreachable). However, mx1.mail.yahoo.com can resolve to any of about 10 IP addresses. So although a mail server will always go to mx1.mail.yahoo.com first, it will randomly try one of those 10 different IP addresses."

The latest e-mail difficulties have convinced Road Runner executives to find a more permanent solution to its e-mail problem, something customers say the company has been unwilling to do until last week.

"We are taking the precaution right now to upgrade the (e-mail) server and have set up another formal backup," Agraz said. "Now that we have taken care of the customers, we are going back to analyze everything that happened and come back with other measures that would ensure total redundancy and ensure that this never happened again."