This blog post was originally published at Arm’s website. It is reprinted here with the permission of Arm.
The challenge (and opportunity) of powering workloads in the AI datacenter
AI has the potential to exceed all the transformative innovations created in the past century. The benefits to society around health care, productivity, education and many other areas will be beyond our imagination. To run these complex AI workloads, the amount of compute required in the world’s data centers needs to exponentially scale. However, this insatiable need for compute has exposed a critical challenge: The immense power data centers require to fuel this groundbreaking technology.
Today’s data centers already consume lots of power: Globally 460 terawatt-hours (TWh) of electricity are needed annually. That’s equivalent to the entire country of Germany. The rise in AI is expected to increase this figure 3x by 2030, more than the total power consumption of India, the most populated country in the world.
Future AI models will continue to become larger and smarter, fueling the need for more compute, which increases demand for power as part of a virtuous cycle. Finding ways to reduce the power requirements for these large data centers is paramount to achieving the societal breakthroughs and realizing the AI promise.
In other words, no electricity, no AI.
Companies need to rethink everything to tackle energy efficiency.
Reimagining the future of AI – a future powered by Arm
The power efficiency DNA of Arm – a company whose initial products were designed to run off batteries and sparked the mobile-phone revolution – allows the industry to rethink how chips are built to accommodate these growing demands of AI.
In a typical server rack, the compute chip alone can consume more than 50 percent of the power budget. Engineers are looking for any method to find ways to reduce this number, every watt counts.
It’s no surprise that in this search, the world’s largest AI hyperscalers have turned to Arm to reduce power. Arm’s latest Neoverse CPU is the most high-performant, power-efficient processor for cloud data centers versus the competition. Neoverse offers hyperscalers the flexibility to customize their silicon to optimize for their demanding workloads, all while delivering leading performance and energy efficiency. Every watt saved enables more compute. This is why Amazon, Microsoft, Google, and Oracle have now all adopted Arm Neoverse technology to solve both general-purpose compute and CPU-based AI inference and training. Arm Neoverse is on the path to being the de-facto standard across cloud data centers.
Consider the data from recent announcements:
-
- AWS Arm-based Graviton: 25 percent faster performance for Amazon Sagemaker for AI inference, 30 percent faster for web applications, 40 percent faster for databases, and 60 percent more efficient than competition.
- Google Cloud Arm-based Axion: 50 percent more performance and 60 percent better energy efficiency compared to legacy competition architectures, powering CPU-based AI inference and training, YouTube, Google Earth, among others.
- Microsoft Azure Arm-based Cobalt: 40 percent performance improvement over competition, powering services such as Microsoft Teams and coupling with Maia accelerators to drive Azure’s end-to end AI architecture.
- Oracle Cloud Arm-based Ampere Altra Max: 2.5 times more performance per rack of servers at 2.8 times less power versus traditional competition and being used for generative AI inference models – summarization, tokenization of data for LLM training, and batched inference use cases.
It’s evident that Arm Neoverse has enabled vast improvements on performance and power-efficiency for general-purpose compute in the cloud. However, customers are now finding the same benefits for accelerated computing. Large-scale AI training requires unique accelerated computing architectures, like the NVIDIA Grace Blackwell platform (GB200), which combines NVIDIA’s Blackwell GPU architecture with the Arm-based Grace CPU. This Arm-based computing architecture enables system-level design optimizations that reduce energy consumption by 25x and provide a 30x increase in performance per GPU compared to NVIDIA H100 GPUs using competitive architectures for LLMs. These optimizations, which deliver game-changing performance and power savings, are only possible thanks to the unprecedented flexibility for silicon customization that Arm Neoverse enables.
As Arm deployments broaden, these companies could save upwards of 15% the total data center power. Those enormous savings could then be used to drive additional AI capacity within the same power envelope and not add to the energy problem. To put it in perspective, these energy savings could run 2 billion additional ChatGPT queries, power a quarter of all daily web search traffic, light 20 percent of American households, or power a country the size of Costa Rica.
That’s a staggering impact on both energy consumption and environmental sustainability.
At a foundational level, Arm CPUs are powering the AI revolution while benefiting the planet.
The future of AI compute is built on Arm.
Rene Haas
CEO, Arm