This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.
In today’s swiftly evolving technological landscape, where artificial intelligence (AI) is reshaping industries and driving innovation, understanding the intricacies of AI performance metrics is paramount. Previously, many of the AI models were required to run in the cloud. As we move toward a future defined by on-device generative AI processing, we must be able to evaluate the performance, accuracy and efficiency with which compute platforms can run AI models. Today, one of the leading ways of measuring a processor’s AI performance is trillions of operations per second (TOPS). TOPS is a measurement of the potential peak AI inferencing performance based on the architecture and frequency required of the processor, such as the Neural Processing Unit (NPU). We will dive into that below.
What is the NPU?
Before diving into the specifics of TOPS, let’s examine the NPU’s importance. For on-device AI processing, the NPU plays a pivotal role in driving efficiency and enabling innovative application experiences for individual users and companies. Assessing the performance of these specialized processors requires a comprehensive understanding of the metrics that underpin their capabilities.
The evolution of the NPU has transformed how we approach computing. Traditionally, the CPU was responsible for executing AI algorithms. As the demands for processing performance skyrocketed, dedicated NPUs emerged as a specialized solution for handling software and applications leveraging AI. These processors are designed to efficiently handle the complex mathematical computations required for AI tasks, offering unparalleled efficiency, performance and power savings.
What does AI TOPS mean?
At the heart of NPU performance measurement lies TOPS, a metric that illustrates the sheer computational power of these units.
TOPS quantifies an NPU’s processing capabilities by measuring the number of operations (additions, multiplies, etc.) in trillions executed within a second.
This standardized measurement strongly indicates an NPU’s performance, serving as a crucial yardstick for comparing AI performance across different processors and architectures. Because TOPS serves as a cornerstone performance metric for NPUs, exploring the parameters that make up the TOPS equation and how they can dictate performance is essential. Doing so can offer a deeper understanding of an NPU’s capabilities.
A multiply-accumulate (MAC) operation executes the mathematical formulas at the core of AI workloads. A matrix multiply consists of a series of two fundamental operations: multiplication and addition to an accumulator. A MAC unit can, for example, run one of each per clock cycle, meaning it executes two operations per clock cycle. A given NPU has a set number of MAC units that can operate at varying levels of precision, depending on the NPU’s architecture.
Frequency dictates the clock speed (or cycles per second) at which an NPU and its MAC units (as well as a CPU or GPU) operate, directly influencing overall performance. A higher frequency allows for more operations per unit of time, resulting in faster processing speeds. However, increasing frequency also leads to higher power consumption and heat generation, which impacts battery life and user experience. The TOPS number quoted for processors is generally at the peak operating frequency.
Precision refers to the granularity of calculations, with higher precision typically correlating with increased model accuracy at the expense of computational intensity. The most common high-precision AI models are 32-bit and 16-bit floating point, whereas faster, low-precision, low-power models typically use 8-bit and 4-bit integer precision. The current industry standard for measuring AI inference in TOPS is at INT8 precision.
To calculate TOPS, start with OPS, which equals two times the number of MAC units multiplied by their operating frequency. TOPS is the number of OPS divided by one trillion, making it simpler to list and compare, that is, TOPS = 2 × MAC unit count × Frequency / 1 trillion.
TOPS and real-world performance
While TOPS provides valuable insights into NPU capabilities, we must still bridge the gap between theoretical metrics and real-world applications.
After all, a high TOPS number alone does not guarantee optimal AI performance; it’s the culmination of various factors working in tandem that genuinely define an NPU’s prowess.
This means considering aspects such as memory bandwidth, software optimization and system integration when evaluating NPU performance. Benchmarks can help us look beyond the numbers and understand how NPUs perform in real-world scenarios, where latency, throughput and energy efficiency matter more than ever.
The Procyon AI benchmark uses real workloads to help translate the theoretical TOPS measurement to the responsiveness and processing capabilities a user can expect in real applications that use AI inferencing. It runs six models at multiple precisions, giving detailed insights into how various NPUs perform. Similar models to these are increasingly common in productivity, media, creator and other applications. Faster performance in Procyon AI and other benchmarks correlates to faster inferencing and better user experiences.
To this end, analyzing real-world performance provides valuable insights into an NPU’s capabilities and limitations. Performance metrics must be scrutinized through the lens of practicality and pragmatism.
The future of NPU performance metrics
As technology continues to advance at a rapid pace — and as the demands of digital transformation continue to shape diverse industries — the landscape of NPU performance metrics is poised for further evolution. While emerging trends are reshaping the way we conceptualize and evaluate NPU performance and computing more broadly, TOPS is a great indicator of performance, and there’s no reason to think it’s going away any time soon.
As various new AI technologies gain traction in the coming years and redefine countless industries, the need for robust performance metrics that capture their unique characteristics will become increasingly pronounced. Adaptability, scalability and relevance to real-world applications will define the future of NPU performance metrics.
Evaluating NPU performance for your needs
Navigating the rapidly changing world of NPU performance measurements may seem daunting at first, but understanding the intricacies of TOPS is vital for industries and individuals alike as digital transformation — especially in the AI space — continues at this pace.
Ultimately, selecting the right system on a chip (SoC) depends on you, your customer or your organization’s workload and priorities — and your decision might very well depend on the SoC’s NPU.
Whether you prioritize raw computational power, energy efficiency or model accuracy, Snapdragon X Series Platforms are equipped with the world’s fastest NPU for laptops at up to 45 TOPS to supercharge your PC and deliver real-world AI experiences into your workflow.
Peter Burns
Sr. Director of Product Marketing, Qualcomm Technologies