PALO ALTO, Calif. — The strategy for multicore processing has, at least in the early years, been one of gluing multiple cores onto the processor die and wiring the cores together. With the next generation of chips, however, both Intel and AMD are aiming at true integration and making their multi-core chips scale to their fullest potential.
Both firms gave hints on their future server processors here on the campus of Stanford University at the 21st annual Hot Chips conference. The summer show, aimed at the fastest spinning propellerheads out there, is usually a deep dive into microprocessor technology.
AMD (NYSE: AMD), playing catch-up to Intel in server processors, discussed its Magny-Cours (M-C) 12-core processor, due next year. The name comes from a French race track but when pronounced sounds like “many cores.”
Intel (NASDAQ: INTC) didn’t want to give too much away, partly because it has other sessions planned for the two-day conference and partly because its own big show, the Intel Developer Forum, takes place from Sept. 22-24. However, it did offer a look into its next high-end processor, the Nehalem-EX, due later this year.
Both M-C and Nehalem-EX use huge amounts of caching. Magny-Cours has 12MB of L3 cache while Nehalem-EX has 24MB, representing a twofold increase over their predecessor parts, “Istanbul” and the Xeon 7400, respectively.
Sailesh Kottapalli, an engineer with Intel, discussed the ring architecture for the EX that’s not unlike the ring used in Intel’s upcoming GPU, Larabee. This simple ring topology connects all cache agents, and to reduce latency and improve bandwidth, Intel made it bi-directional. This improves bandwidth between the cores by a factor of four.
The EX uses a simple rotary-type protocol where traffic moves one core at a time around the ring like a bus stopping along a route. The fabric is scalable, so “as you add cores, the bandwidth scales with it,” Kottapalli said.
Magny-Cours uses a multi-chip module (MCM) design, which AMD long ridiculed Intel for using with its early Xeons, saying they were not true multicore processors. M-C will use two six-core packages, connected with a new, faster HyperTransport interface. AMD has defended the move to MCM, saying a native 12-core processor would simply run too hot, and it wants to keep the Magny-Cours in the same thermal envelope as the six-core designs.
Memory interfaces are also expanding, as these chips are going into servers with 32GB or 64GB of memory, or more. Magny-Cours has four HyperTransport ports and four memory channels, while Nehalem-EX has two memory controllers on the die, which in turn control two memory buffers each, and they support two DDR3 memory channels.
So all told, the Nehalem-EX addresses 8 DDR channels, with 2 memory DIMMs per channel. The current Nehalem processor addresses three channels.
Magny-Cours will achieve improved speed through a feature called HT Assist, which is also in the new Istanbul processor. One megabyte of L3 cache is reserved as a directory for all of the cache lines used in the system, so when a core needs data, rather than probe every cache on the processor, it just goes to the directory to find where that data is located.
Older Opterons had to probe every core and cache for data, which added overhead on a quad-core system. On a six- or 12-core system, that was unacceptable, so the directory was created.
AMD plans a socket-compatible upgrade to Magny-Cours with more cores for additional thread-level parallelism and more cache to maintain the cache-per-core balance, all in the same power envelope. Beyond that, in 2011, the company is planning to introduce new 64-bit microarchitecture codenamed “Bulldozer,” which will have a whole new instruction set for improved memory parallelism.
In a slight detour from the chip talk, Kevin Leigh, a distinguished technologist and BladeSystem architect lead for HP (NYSE: HPQ) came out to discuss the advent of blade servers. He tied it to the previous talks by noting that “there are things that blades helped with chip innovations and how in turn these innovations helped make blade servers be more desirable.”
To deal with the explosion in network switches, virtual connects were established to make switches transparent to network admins. That meant that rather than reassigning ports, only the NIC firmware had to be changed, making it easier to administer the ports.
Next came Flex-10, a virtual provisioning for network ports. Leigh noted that there were a lot of under-provisioned network ports when each blade had its own Ethernet port. The solution was a single, shared 10 GbE port that was partitioned between blades with programmable bandwidth management.
The third innovation driven by necessity was Virtual Ethernet Port Aggregation (VEPA). When virtual machines on the same server communicated using virtual NICs and switches, the traffic was not visible to network administrators. They wanted visibility into traffic, even if it was staying within the same machine.
All of these changes, driven by the move to a bladed environment, have helped reduce hardware, administration headaches and power draw, Leigh argued.
Hot Chips continues today with talk about Intel’s “Moorestown” Atom processor as well as “Westmere,” Intel’s first 32-nanometer chip that comes with a graphics processor on the same die. The chip is scheduled to go into production later this year.
Nvidia will also be on hand to show off the latest in its Ion platform, the combined ARM processor and GeForce graphics chip. On Tuesday, IBM will make the first public disclosures of its Power 7 processor, its high-end RISC