RealTime IT News

COTS Spotted as Key to Clusters

SAN JOSE, Calif. -- The hottest trend in clustering, High Performance Computing, is using commercial off-the-shelf products (or COTS), according to Intel's multi-threading expert.

David Kuck, enterprise platforms group fellow and director of Intel's KAI Software Lab, said advances in price, performance and interconnects are improving at such a rate that the clustering industry is becoming more available to enterprise customers and may even force some longstanding supercomputer owners to rethink their designs.

"These COTS systems have improved in the recent past and the proprietary as well as open software are seeing gains," Kuck said during his keynote at the ClusterWorld conference here. "Processors are coming down in price and are increasing in the number of multiple treads and scaling up in performance. When you think about it the fact that you can have slow or fast Ethernet, InfiniBand and other fabric switching, you can now pick and choose what kinds of processors you want to use.

Kuck said the biggest reason commercial off-the-shelf products are gaining ground is that more CIOs are hearing the story of how it works on several levels.

"JP Morgan/Chase recently came out in favor of these COTS clusters," Kuck said. "There are more and more stories coming out about ROI and it's a matter of awareness. Linpack [benchmarking] tests are no longer a true an indicator of what is going on."

Santa Clara, Calif.-based chip making giant is very involved in improving not only the processor speeds but the interconnects as well. One of the ways Intel said it would help would be to standardize the pin architecture of its future Xeon and Itanium processors. The suggestion came up in conversation during a press briefing in January 2004. An Intel spokesperson said the company is using 2007 as the relative target date that the two disparate chip architectures would become relatively interchangeable.

Kuck said Intel is adding PCI Express and InfiniBand functions as well as improving on its throughput computing initiative with producing multiple threads. The technique allows one thread to sleep and waking another one up at the right time to operate independently. Intel's projection is that the numbers of threads per die as well as multi processor dies are increasing exponentially approximately every two years.

Current configurations can handle four threads. That number is expected to increase to 16 threads in 2005, some 64 threads by 2007. The trend should peak in 2011 with more than 200 threads per die, according to Kuck. He said Intel is now working with an unnamed OEM on cluster-level simulator to test to a customer's specifications and not a series of artificial benchmarks.

Intel currently has threading tools but is looking to release specific cluster tools in a technology preview later this month and a continued rollout in the next two years with a good picture by the end of 2005.

New to the industry is the idea of "constellations" which Kuck describes reasonably connected symmetric multiprocessing (SMP) boxes. Another configuration popular in the sector is the distributed virtual memory model, which Kuck thought was a bad idea at first.

"If you have an open Multi Processor [MP] program and you want to scale that to four nodes, we have an open MP directive that will take care of that," he said. "But you may still have to think about what you want to use that for. MP clustering is not for all applications at this point. It works mostly for read only programs. It's good for bio informatics codes, but there is lots of room for improvement."

One of the improvements is to legacy systems, which barely scratch the surface when it comes to peak performance, Kuck said. For example, the Japanese Earth Simulator Cluster, which is the number one fastest supercomputer in the world, has a large number of projects that run just a bit more than 30 percent of the peak performance it was designed for.

"If you look at the U.S. National Laboratories, their numbers are closer to 4 percent or 6 percent peak operating capacity. If they get to 10 percent, it's a big deal," Kuck told internetnews.com. "Part of the problem is that the programs are 10-years old in some cases. The other obstacle is that interconnect technology has vastly improved since some of these systems were installed.

Kuck said Intel is currently revisiting each of its supercomputing contracts in an effort to upgrade the systems with a combination of chips and higher speed connections. In the next three years, Intel has been contracted for more than 4,500 units of Itanium 2 in the U.S. and Chinese grids, which Kuck said would impact and influence deployments at hundreds of universities. The lessons learned during the upgrade process are expected to filter down to Intel's enterprise clustering strategy.