Intel’s new “Ice Lake” third-generation Xeon scalable processors are reviving interest in liquid cooling, an old concept that’s begun to move into the mainstream in recent years.
Lenovo officials earlier this month announced new servers powered by the Intel chips, including extending the reach of its Neptune warm-water cooling technology deeper into some rack systems optimized for such workloads as artificial intelligence (AI) and high-performance computing (HPC).
The same day, Microsoft announced that it was evaluating a two-phase immersion cooling technology in which entire racks of data center hardware are submerged into a couch-shaped tank containing specially engineered fluid that won’t damage the electronics or impact the performance of the systems.
The two announcements were the latest indicators that system makers, cloud providers and enterprises are continuing to look at liquid cooling technologies in data centers where hardware components continue to run hotter, demand for greater density is growing, and compute- and data-intensive workloads like AI, advanced analytics and HPC are expanding.
In addition, the rise of edge computing exacerbates these trends, with more data processing, storage and analytics running closer to where users are working and data is being generated, and where small remote data center environments are being deployed to support this work. Most systems and their components are still cooled by air, and data centers are designed in configurations such as hot-air and cold-air models, with cold air circulated around systems or components, exchanging the warm air for cooler air.
Efficiency Is Key
However, liquid is more efficient at cooling than air. It has been used in massive systems such as mainframes and supercomputers in the past, and OEMs for more than a decade have offered liquid-cooling technologies in some of their enterprise-level hardware. However, liquid cooling still has some challenges to overcome, from the natural hesitancy among IT leaders to having liquid running so close to their electronics and enterprises’ historically slow adoption of new technologies to a lack of technicians skilled in managing liquid-cooling systems.
As the announcements from Lenovo and Microsoft show, liquid cooling continues to be seen as a way to continue to drive performance improvements at a time when data center density and component heat generation are both rising. Microsoft noted that CPUs that once could handle 150 watts of electric power per chip can now manage more than 300 watts, with GPUs jumping to more than 700 watts.
“Air cooling is not enough,” Christian Belady, distinguished engineer and vice president of Microsoft’s data center advanced development group, said in a blog post outlining the company’s latest efforts. Heat transfers in liquids are much more efficient than air, he noted. “That’s what’s driving us to immersion cooling, where we can directly boil off the surfaces of the chip. … Liquid cooling enables us to go denser, and thus continue the Moore’s Law trend at the data center level.”
Turn to Liquid
The power efficiency challenge in data centers has been a key focus for IT staff and tech vendors for more than a decade. As they’ve added more – and faster cores – to their processors, Intel, AMD and other chip makers also put in place management and control capabilities aimed at balancing power, core count and chip speeds.
“As the thermal envelope of these AI and cloud computing chips has already increased dramatically, IT and data center cooling architectures are changing along with it,” said Joe Capes, co-founder and CEO of LiquidStack, a specialist in two-phase immersion cooling technologies similar to what Microsoft is testing in one of its Azure data centers.
Microsoft’s Two-Step Immersion
In Microsoft’s process, when systems generating heat are placed in the fluid, the fluid heats up and boils, changing into a vapor. As the vapor rises, it carries away the heat, which is then removed via a condenser. The vapor then returns to liquid form and is put back into the liquid cooling system. The fluid boils at 122 degrees Fahrenheit, about 90 degrees less than the boiling point of water.
Microsoft is working with data center IT system manufacturer Wiwynn in developing the two-phase immersion technology, which is now running in the company’s Azure data center in Quincy, Wash. The fluid was developed by 3M and has dielectric properties that enables servers to operate normally while fully immersed.
Microsoft found that the cooling system reduced a server’s power consumption by 5 percent to 15 percent, which is important when running such HPC applications as AI.
Lenovo Cools the Systems
Lenovo’s Neptune initiative uses multiple liquid cooling technologies. One is a liquid-to-air heat exchanger, which brings a water system to the rear door of a rack, absorbing the heat from the exhaust of the air-cooled systems. That technology is being used with one of two configurations of its new ThinkSystem SR670 V2 aimed at HPC and AI training workloads and supporting up to eight small or large form factor Nvidia GPUs.
In addition, Lenovo is expanding its direct water-cooling technology – which has been used in previous servers to cool such components as CPUs and memory – to GPUs in its new ThinkSystem SD650-N V2 server, which combines two Xeon Ice Lake chips with four Nvidia A100 GPUs in a highly dense 1U form factor. With this direct-to-node technology, warm water is piped into the system to cool the components.
In its ultra-dense SD630 V2 server, Lenovo is using its Neptune Thermal Transfer Modules, which integrate hermetically sealed heat pipes filled with liquid instead of traditional heat sinks. With that, the 1U server can support processors with up to 250W of power consumption, which officials said delivers 1.5 times the performance of the previous generation in the same rack space.
Liquid Cooling is Industry-Wide
Other system vendors and cloud providers also are leveraging liquid cooling for some of their systems. Hewlett Packard Enterprise worked with Asetek to bring such technology to some of its Apollo servers and Dell has done the same with CoolIT for some PowerEdge systems. IBM for years has offered rear-door heat exchanger technology for some servers, and Fujitsu also has its own liquid immersion technologies.
Google Cloud in 2018 began introducing liquid cooling to its operations to cool its TPU chips for AI workloads. In addition, the most recent work in this area by Microsoft is not its first. As part of its Project Natick, the company in 2018 dropped a large, sealed container that held a data center to the seafloor at a depth of 117 feet to test not only its cooling powers but also how the equipment could be protected against other environment impacts like temperature fluctuations, humidity and the jostling that comes with human interaction in traditional data centers. The company pulled up the shipping-size container last year and is evaluating the results.
Further reading: Google Turns to Intel Exec to Supercharge Server Chip Development
There is interest in liquid cooling in the data center. A report from ResearchandMarkets said the global market will grow an average of 22.6 percent a year, from $1.2 billion in 2019 to $3.2 billion in 2024, driven by the demand for greater power efficiency, lower operating costs and better overclocking capabilities.
Challenges Ahead
That growth makes sense, given that liquid can transfer heat much more efficiently than air, Rob Enderle, principal analyst with The Enderle Group, told InternetNews. However, there are hurdles to its adoption. Data center operators tend to be conservative in adopting technologies, even those that have been proven successful over years. Those most willing to explore liquid cooling solutions tend to be in areas that need the highest performance, such as HPC, supercomputing and cloud.
“The issue is that you have to manage the liquid, and it requires the full swap out of everything you intend to cool this way,” Enderle said. “Thus it is more attractive for new installations than for retrofitting old ones. You have to ensure it doesn’t get contaminated by metal – which can move around much more easily in liquid than with air – or anything else that would cause corrosion. This means special training, and given this approach is relatively rare still, a lot of retraining.”
Most companies “still seem to think air cooling is good enough, but you should get more consistent performance and have fewer failures in systems that are cooled this way if they are correctly implemented. That last [issue], due to the lack of existing trained people, is still problematic,” he said.