COTS Spotted as Key to Clusters

SAN JOSE, Calif. — The hottest trend in clustering, High Performance Computing, is using commercial off-the-shelf products (or COTS), according to Intel’s multi-threading expert.

David Kuck, enterprise platforms group fellow and director of Intel’s KAI Software Lab, said advances in price, performance and interconnects are improving at such a rate that the clustering industry is becoming more available to enterprise customers and may even force some longstanding supercomputer owners to rethink their designs.

“These COTS systems have improved in the recent past and the proprietary
as well as open software are seeing gains,” Kuck said during his keynote at
the ClusterWorld conference here. “Processors are coming down in price and
are increasing in the number of multiple treads and scaling up in
performance. When you think about it the fact that you can have slow or fast
Ethernet, InfiniBand and other fabric switching, you can now pick and choose
what kinds of processors you want to use.

Kuck said the biggest reason commercial off-the-shelf products are
gaining ground is that more CIOs are hearing the story of how it works on
several levels.

“JP Morgan/Chase recently came out in favor of these COTS clusters,” Kuck
said. “There are more and more stories coming out about ROI and it’s a
matter of awareness. Linpack [benchmarking] tests are no longer a true an
indicator of what is going on.”

Santa Clara, Calif.-based chip making giant is very involved in improving
not only the processor speeds but the interconnects as well. One of the ways
Intel said it would help would be to standardize the pin architecture of its
future Xeon and Itanium processors. The suggestion came up in conversation
during a press briefing in January 2004. An Intel spokesperson said the
company is using 2007 as the relative target date that the two disparate
chip architectures would become relatively interchangeable.

Kuck said Intel is adding PCI Express and InfiniBand functions as well as
improving on its throughput computing initiative with producing multiple
threads. The technique allows one thread to sleep and waking another one up
at the right time to operate independently. Intel’s projection is that the
numbers of threads per die as well as multi processor dies are increasing
exponentially approximately every two years.

Current configurations can
handle four threads. That number is expected to increase to 16 threads in
2005, some 64 threads by 2007. The trend should peak in 2011 with more than
200 threads per die, according to Kuck. He said Intel is now working with an
unnamed OEM on cluster-level simulator to test to a customer’s
specifications and not a series of artificial benchmarks.

Intel currently has threading tools but is looking to release specific
cluster tools in a technology preview later this month and a continued
rollout in the next two years with a good picture by the end of 2005.

New to the industry is the idea of “constellations” which Kuck describes
reasonably connected symmetric multiprocessing (SMP) boxes. Another
configuration popular in the sector is the distributed virtual memory model,
which Kuck thought was a bad idea at first.

“If you have an open Multi Processor [MP] program and you want to scale
that to four nodes, we have an open MP directive that will take care of
that,” he said. “But you may still have to think about what you want to use
that for. MP clustering is not for all applications at this point. It works
mostly for read only programs. It’s good for bio informatics codes, but
there is lots of room for improvement.”

One of the improvements is to legacy systems, which barely scratch the
surface when it comes to peak performance, Kuck said. For example, the
Japanese Earth Simulator Cluster, which is the number one fastest
supercomputer in the world, has a large number of projects that run just a
bit more than 30 percent of the peak performance it was designed for.

“If you look at the U.S. National Laboratories, their numbers are closer
to 4 percent or 6 percent peak operating capacity. If they get to 10
percent, it’s a big deal,” Kuck told “Part of the
problem is that the programs are 10-years old in some cases. The other
obstacle is that interconnect technology has vastly improved since some of
these systems were installed.

Kuck said Intel is currently revisiting each of its supercomputing
contracts in an effort to upgrade the systems with a combination of chips
and higher speed connections. In the next three years, Intel has been
contracted for more than 4,500 units of Itanium 2 in the U.S. and Chinese
grids, which Kuck said would impact and influence deployments at hundreds of
universities. The lessons learned during the upgrade process are expected to
filter down to Intel’s enterprise clustering strategy.

News Around the Web