With CPU vendors hitting the performance wall several years back, unable to make their processors go any faster, the solution to increase performance became to go multi-core.
Well, that doesn’t appear to be helping very much at the upper end of the performance spectrum. Tests conducted by engineers at Sandia National Laboratories in New Mexico have found that multi-core processors don’t help, and in fact even hurt somewhat in areas of high performance computing (HPC).
The problem is two-fold: first, as the Top 500 supercomputer list shows,
most of these systems are mostly based on the x86 architecture, and the x86 was never meant to be a supercomputer processor. It has limitations to memory bandwidth and memory management that don’t show up on a personal computer but will show up in a supercomputer.
What Sandia engineers found is as more cores are added, the more performance degrades because the memory can’t keep up with the CPUs. Also, Intel and AMD both tend to offer slower clock speeds with their quad core processors than with their dual core processors, so the CPUs are slowing down, too.
Steve Conway, senior analyst with IDC for high performance computing issues, said this problem has been around for a while, and multi-core is only exacerbating it. “x86 processors were never designed for HPC,” he told InternetNews.com. “Those processors were not designed to communicate with each other at a high speed. With these big systems, you have to move data over large territories.”
IDC has been surveying customers about their HPC use and has found this problem occurring already. So far, 21 percent of sites contacted by IDC said they have found their applications running worse on newer hardware than it did on older hardware, and more than 50 percent said they expect in the next 12-24 months to be in that situation
The memory issue is addressed in an article from the Institute for Electrical and Electronics Engineers (IEEE) publication IEEE Spectrum. The problem engineers at Sandia found is the connection between the CPU and memory.
Intel’s frontside bus has always been considered a bottleneck between the CPU and memory because all four cores have to go through that single gateway. Although the number of cores per processor is increasing – Intel has a six-core Xeon and plans an eight-core Nehalem processor next year – the number of connections between the CPU and memory has not.
Other problems
There are other causes for the slow-down as well. Many applications just aren’t meant to be parallelized and would be better off with a single core processor running at 4GHz or 5Ghz rather than eight cores at 2Ghz. It all depends on the nature of the app.
Apps called “ridiculously parallel” are the easiest to run on multi-core because the work can be sliced up and distributed over all those cores, and the cores are not dependant on the results of the work done on other cores.
SETI@Home and other distributed computing projects are an ideal example of this. Each client is picking off a piece of data from a large pile and processing it independent of everyone else’s work.
But when the results of step A are needed to process step B, step B is going nowhere until it gets the data it needs, and all the parallelization in the world is useless.
The third problem facing these supercomputers is their massive physical size. Supercomputers now occupy very large buildings and consist of rows upon rows of racks or blades. What happens when a core needs data that’s in memory on a rack 80 feet away? A long wait, at least in supercomputer terms.
Conway said the problem exists with AMD’s Opteron as well, which does not have the frontside bus, and Intel’s move to the Nehalem architecture, which eliminates the FSB, won’t help, either. “With a single core processor, you had x amount of memory. Now that processor has four cores, each one has one-quarter of that memory bandwidth,” he said.