Hyperscale data center operators like Google, Amazon Web Services (AWS) and Facebook for years have turned to in-house development of hardware systems and components when what they needed was not available from OEMs.
Faced with operating massive and fast-growing data centers and offering a wide array of new services to end users, these hyperscalers are constantly challenged in such areas as performance, power efficiency, latency and density, and they have the resources and money to develop their own solutions.
Google has been active in this area. The company, whose business spans everything from search to the cloud, developed custom hardware – often in conjunction with tech partners – ranging from solid-state drives (SSDs) and hard drives to network switches and network interface cards (NICs). An internal project for container orchestration, codenamed “Borg,” eventually was given to the open-source community to create Kubernetes.
In the area of compute, Google in 2015 launched its Tensor Processing Unit (TPU), an accelerator for artificial intelligence (AI) and machine learning workloads that is aimed at the company’s TensorFlow software. Three years later came Video Processing Units (VPUs) for video distribution, and in 2019 Google created OpenTitan, a security-focused open-source chip.
System on a chip is ‘the new motherboard’
Now the company is making a deeper push into the custom chip space, announcing this week that it has hired longtime Intel chip engineer Uri Frank as vice president of engineering for server chip design. In a blog post, Amin Vahdat, Google Fellow and vice president of systems infrastructure, wrote that the rapid evolution of cloud infrastructure is demanding higher performance and greater power efficiency at a time when Moore’s Law continues to slow.
Google, like other companies, for years has relied on motherboards – integrated with components from CPUs to accelerators to memory – in its servers. However, that design, with the components connected via inches of wire, is no longer able to keep up with demand, Vahdat wrote. Now the company is turning to system-on-a-chip (SoC) designs, where the various functions sit on the same chip or on multiple chips inside a single package.
“In other words, the SoC is the new motherboard,” Vahdat wrote. “On an SoC, the latency and bandwidth between different components can be orders of magnitude better, with greatly reduced power and cost compared to composing individual ASICs on a motherboard. Just like on a motherboard, individual functional units (such as CPUs, TPUs, video transcoding, encryption, compression, remote communication, secure data summarization, and more) come from different sources. We buy where it makes sense, build it ourselves where we have to, and aim to build ecosystems that benefit the entire industry.”
A range of chip manufacturers, including Intel, AMD, NXP and Analog Devices, develops and sells SoCs. ARM’s chip designs, which can be found in most smartphones and other mobile devices, also are SoCs. Frank will lead Google’s efforts in SoC development. He comes to the company after more than two decades at Intel. He joined the world’s top chip maker in 2000 and moved his way up the ranks, reaching director of engineering in 2011, and ending up as corporate vice president of Intel’s Design Engineering Group and general manager of its Core and Client Development Group.
In a statement on LinkedIn, he wrote that “Google has designed and built some of the world’s largest and most efficient computing systems. For a long time, custom chips have been an important part of this strategy. I look forward to growing a team here in Israel while accelerating Google Cloud’s innovations in compute infrastructure.“
Google has become a key player in the booming global public cloud space, trailing AWS and Microsoft Azure, which combined account for more than half of a cloud infrastructure services market that in the fourth quarter 2020 hit $37 billion, a 35 percent year-over-year increase, according to Synergy Research Group. In 2020, Google Cloud generated more than $13 billion in revenue, an increase over the $8.9 billion the previous year.
Further reading: Growth of Enterprise Cloud Adoption Not Slowing Down
Google, like other public cloud providers, also is making a push into on-premises data centers to give it a larger presence in the expanding hybrid cloud space. In 2019, Google introduced Anthos, an effort to enable enterprises to run Google Cloud services on premises and easily move data and applications between private and public clouds.
Roger Kay, principal analyst with Endpoint Technologies Associates, told InternetNews that in a rapidly evolving hardware world where big cloud providers like Google, AWS and Microsoft are increasingly moving toward designing their own data centers, “when a major silicon architect moves from Intel to Google, that’s a seismic event in the industry.”
GPUs, FPGAs and ARM challenge CPUs
The hyperscalers over the past several years have begun to understand that they can custom design a lot of their own chips, systems and other technologies rather than be beholden to Intel, AMD, Dell, Hewlett Packard, Lenovo or other traditional vendors, Kay said. They found that accelerators like GPUs and field-programmable gate arrays (FPGAs) can run workloads like AI and machine learning as well or better than CPUs from Intel or AMD.
They also understood that with ARM chip licensing and in-house talent, they also could design their own chips and have third-party manufacturers like Taiwan Semiconductor Manufacturing Corp. (TSMC) make them.
“All the traditional silicon suppliers are pretty nervous about that because these [hyperscalers] are also the biggest customers,” Kay said.
Intel officials for years have argued that ARM doesn’t have the ecosystem of hardware and software partners to make a deep push into the data center. However, ARM has steadily grown an ecosystem and its chip designs have begun to make inroads. In addition, the licensing model lends itself to large companies with resources and money that need their chips to have specific capabilities.
The AI workloads that helped fuel this move toward homemade custom chips may not be a large market now, but it got everything rolling in that direction, opening up further possibilities.
“It’s a way to get this new design [and] manufacturing system up and going and then once it’s up and going, you can say, ‘Well, what else can we do with this?'” Kay said.
Further reading: Biden Moves to Bolster Chip Industry, But It will Take Time, Money
Is Microsoft next?
That has been seen with other hyperscalers as they turn to in-house development of compute and other systems to improve performance and efficiency. More than a decade ago, as it was adding billions of users to its platform, Facebook created an engineering team to develop a highly energy-efficient data center – from servers and power supplies to racks and cooling – and two years later in 2011 helped launch the Open Compute Project (OCP) to share its designs.
The OCP now has myriad projects that address everything from servers and hardware management to networking, racks, power, security, storage, telecommunications and future technologies.
In addition, Facebook in 2019 announced plans to work with partners to develop hardware optimized for such workloads as AI inference and training and video transcoding as well as new ASIC chips for AI inference jobs.
AWS also has developed its own custom technologies. In 2016 – after ending work with AMD to develop a chip alternative to Intel’s offerings – Amazon bought ARM-based startup Annapurna Labs for $350 million. Annapurna engineers first began working on Internet of Things (IoT) gateway and a chipset called Nitro – aimed at networking and storage jobs – before developing Graviton, an ARM-based multi-core SoC that power some AWS instances and can be found in the cloud provider’s Outposts systems that are housed in users’ on-premises data centers or colocation facilities and deliver AWS services.
For its part, reports surfaced last year that Microsoft is designing its own custom server chips to run in systems in its Azure cloud, where currently most servers are powered by Intel Xeon processors. Microsoft chips reportedly will be based on highly power-efficient ARM designs.