Multi-Core The Cool Factor at Hot Chips Conference

STANFORD, Calif. — Hot Chips is an appropriate term for a conference on future trends in microprocessors. On a rare hot day here in Memorial Hall on the campus of Stanford University, even the air conditioners fail to counterbalance the heat from notebooks that adorn practically every lap in the auditorium.

Inside, the talk among engineers and computer scientists is around multi-core and all things multi-core. Intel and AMD have shifted their strategy from clocks to cores and every demonstration, from graphics cards to research projects, were showing off their multi-core efforts as well.

The problem is while the hardware engineers have made a monumental effort to build the multi-core machines, the applications have not come. That’s because parallel programming is a complicated science that’s driving even the impressive collection of PhDs at this show up a wall.

“A lot of it is compiler science that needs to be updated to make programming [multithreaded applications] easier, and it will happen,” Peter Glaskowsky, technology analyst for Envisioneering, told “Multi-core is really good at a narrow class of applications. A lot of people are doing a lot of work so multi-core will benefit many kinds of applications.”

But just throwing cores at the problem won’t help without careful design, said Erik Lindholm, an Nvidia engineer and veteran of Silicon Graphics in his keynote speech. Lindholm was discussing the scalar design of Nvidia’s most recent video chip, the G80, which is found in the 8800 line of cards.

“You can’t build infinitely wider hardware, your scalability goes down,” he said. There must be balance between workload units. In the case of a video card, that means balancing the pixel processors, vertex engines and triangle animation. “You don’t want to emphasize one part of the shader and stall out another. That will cause bubbles in the pipeline.”

Nvidia  discussed its Compute Unified Device Architecture, or CUDA, a technology for writing applications in the C language  that utilize the computation power of the G80. The company has introduced a line of computers under the Tesla brand name.

The Tesla products are designed to aid in heavy computation projects, especially floating-point calculations, in science and medicine. The G80 can handle up to 12,288 threads and has 128 thread cores. CUDA is designed to address the threading problem by allowing a programmer to write multi-threaded applications with just a few lines of C code.

AMD followed with a demonstration of its HD 2900 video card, but stuck to promoting it as a graphics processor. “To us, whether you are playing video or doing 3D, it’s a form of decoding and decompression… so our view of the graphics chip is it’s a decoder and decompressor,” said Mike Mantor, a Fellow at AMD .

Intel  showed off its 80-core prototype, which was designed to be a network on a chip with teraflop performance, and running at under 100 watts. The caveat to this prototype is that it’s not compatible with x86 systems. Right now, it remains a lab experiment.

The chip uses a tile design for the cores, in an eight-by-ten grid. Each tile has a router connecting the core to an on-chip network that links all the cores together, rather than make them go through the frontside bus like its Core 2 and Xeon processors. Due to its advanced sleep technology, Intel estimates it cuts two- to five-fold reduction in power leakage.

The many-core speeches continued with Madhu Saravana Sibi Govindan of the University of Texas at Austin, who discussed UT’s own multi-core project, TRIPS (The Tera-op, Reliable, Intelligently adaptive Processing System).

TRIPS uses a design known as EDGE, Explicit Data Graph Execution, which executes a stream of individual instructions as a block. Processors today function by executing instructions one at a time, very fast. EDGE attempts to run as many instructions as possible in one block.

TRIPS can execute up to 16 instructions per cycle, whereas the Intel Core 2 processor can only do 4. Because of its large blocks, a 366Mhz prototype was able to flatten a Pentium 4 in some benchmarks, while it was flattened in others. At this point, the processor and code for it is still in the development stages and Govindan said maximum performance required hand coding, a skill not many people have acquired.

The final multi-core performance came from Tilera, which had its grand coming out party today. Anant Agarwal, the MIT professor who led the TILE64 processor’s development and is CTO of Tilera, talked about the need for efficiency in multi-core scaling.

“The problem is processors don’t scale, there is no power efficiency and the programming model is in the dark ages,” he said. The tools and the programming model are still oriented around single processor development. “Imagine opening a debugger on a 100 process application and having 100 windows for debugging. It just doesn’t work.”

The Tilera TILE64 chip has an almost identical design to the Intel prototype: a mesh layout of many cores interconnected by a high speed grid network and a router on each core to facilitate communication. Memory and network controllers are on the chip for maximum performance.

Asked after his speech what’s the difference between the TILE64 and Intel’s prototype, Agarwal responded that the TILE64 is shipping now and has compilers and debuggers and other tools for building applications, but declined to get into the technological differences.

News Around the Web