This blog post was originally published by Bitfury. It is reprinted here with the permission of Bitfury.
We are in the middle of one of the largest technological revolutions in history. The growing popularity of the Internet of Things, combined with significant advances in artificial intelligence, will culminate in a wave of digital disruption, changing (and improving) many aspects of our online and offline worlds. But for this future to take hold, the way we design hardware will need to change as well — namely, shifting from the cloud to the edge and from traditional computing designs to AI architectures.
In this article, I will summarize the latest trends and changes in computing related to AI and IoT, as well as provide some essential metrics for companies seeking to build solutions for AI/IoT devices.
About the Internet of Things
More than 125 billion “things” are expected to be connected to internet (IoT) by 2030* — from smart phones, to cameras, to smart-home devices and more. Each of these devices, down to the smallest sensor, are creating exponentially more data for analysis, which we previously could not use due to its massive scale. Now, with the advent of artificial intelligence, this data can be analyzed properly, giving us even deeper insights and better services. To do so, however, this data then needs to be analyzed by AI applications as quickly and securely as possible, returning results almost immediately (such as a home security system issuing an alert; a smart thermostat dropping temperatures now that everyone in the house has left for work, etc.). For these devices to work as they should, they need to be computing “at the edge,” or as close to the device’s location as possible, with unrivaled security and efficiency. This is creating unprecedented demand for AI computing products that can deliver real, game-changing results for device/AI applications at the edge.
Consider the Design: Cloud Computing vs. the Edge
The products available on the market today for AI computing were designed primarily for cloud computing operations. The computing sector, characterized by large regional data centers, has relatively few cost, power and scalability constraints. Data centers can always be expanded, and the market for their services means they can operate profitably almost anywhere in the world. As a result, the suppliers to this sector have, for years, designed and sold expensive and non-optimized technologies. One reason for this is because many cloud computing applications (such as running your Google Drive or streaming your Netflix), simply do not need innovative designs. Hardware based on standard computing/graphic computing architectures work just fine.
As the desire for AI applications grew, it became apparent that these architectures and data centers could not handle AI workloads, much less AI workloads at “the edge.” Start-ups have now flooded the market, trying to produce hardware that can process AI applications better. However, for companies looking to capitalize on this new generation of innovation through their own AI/IoT applications, it is often difficult to determine what products can deliver the computational power and results. AI hardware is a new market, crowded with newcomers and institutional giants that are offering diverse products based on different approaches to computing.
Bitfury’s expert AI team (hailing from Intel, ASUS Group and more) have compiled the following key metrics that we believe are essential when considering the tradeoffs between effectiveness, usability and convenience of a hardware provider for AI applications. We hope it helps your company better analyze and understand which product is best for you.
1. The AI Software Stack and Tools
Key Takeaway: You should carefully analyze the complete software stack of your AI processor, ensuring it is integrated with existing open-source AI frameworks.
TensorFlow, Caffe’ and PyTorch are the most popular, open-source AI development frameworks. Support for CPU and GPU designs are natively integrated into the backend of these frameworks, but many new AI processors are not. This means that the software stack for many AI processors takes the output of these frameworks as the input of their custom compiler and uses it to create the binary executable file. When analyzing the software stack of your chosen AI processor, be sure to check where compression, pruning, precision conversion, and retraining (if needed) are performed, and assess the impact on their development flow. In some cases, data scientists have to switch from the development framework to the custom compiler and back to the framework until the required performance is achieved. It is also critical to evaluate the pre-trained networks provided, and the available model zoo. We also recommend you have access to an on-premises or online simulator, which can provide a good preview of the expected performance of your neural network on the targeted hardware.
2. Neural Network Precision
Key Takeaway: Review the precision requirements carefully — otherwise you may need to spend valuable time and resources retraining your network.
Most of the accelerators for edge inference work with reduced precision. The reason for this is because converting the network to a lower precision simplifies the hardware implementation and dramatically reduces the power consumption, which is critical for an edge application. While CPUs/GPUs make calculations at 32-bit with Floating Point precision, most of the new accelerators for inference work at INT8/INT12/INT16 precision and some with bfloat16. This means that all the values must be remapped at a lower precision — to keep high accuracy, it is often necessary to retrain the network.
3. Accuracy
Key Takeaway: Review the conditions for accuracy in your application against the design of the processor to ensure coordination.
In many AI applications, accuracy is a key element. Thanks to the advances in neural network development, it is possible today to maintain high accuracy while reducing the precision to INT16/INT12/INT8. With many neural networks, after a retraining, there is no significant loss in accuracy in inference between 32-bit floating points and 16-bit integer. Some suppliers also provide 12-bit resolution with almost same accuracy of 32-bit floating points. In many non-critical applications, 80-bit accuracy is enough because it guarantees 98–99.5% accuracy (vs. the same model at 32-bit floating point).
4. Performance & Throughput
Key Takeaway: Don’t be sold only by “peak tera operations per second.” Instead, ask for access to a simulator that can show how the chip delivers for your specific network design.
Tera operations per second (known as TOPS) is commonly used to declare the peak performance of a specific architecture. Today, an AI chip for end-node or low power consumption AI edge appliances can deliver 0–10 TOPS, while 10–50 TOPS are more common found in powerful edge appliances and edge data centers. Although “peak TOPS” is a good marketing tool, it does not necessarily indicate the real performance an AI chip can deliver on your neural network. We’ve seen cases where one neural network performs radically different on 2 chip designs that boast the same TOPS. This is because that, based on the design of the chip, its utilization can dramatically change. Some networks can use 10–20% of the chip resources, while others can maybe reach 80–90%. Run your network on an online or on-premises simulator to see what real TOPS can be delivered for your case.
For images, good metrics for judging an AI chip include frames per second (f/s), which gives you a good indication of the peak performance; and efficiency, by second per joule (f/s/J).
Note: Performance and efficiency can vary based on the size of the batch of images. The MLPerf website (https://mlperf.org/) gives a very good overview of measured performances of each chip.
5. Latency
Key Takeaway: Consider what threshold for latency your AI application requires before selecting an AI processor — do not sacrifice other performance for ultra-low latency if it is not required.
Latency can be a critical parameter for AI edge applications — for example, the AI computing unit in a vehicle must have a very low latency, in the order of few milliseconds, to process all data and react quickly. This is the same for a data center application where a massive amount of data must be quickly processed. In many other edge applications, this is a not decisive factor. If it is for your application, ensure your processor has the lowest latency possible.
6. Power Consumption and Thermal Design Power
Key Takeaway: Power consumption is a critical factor to consider when choosing a product but be aware of the tradeoffs between power consumption/small size of hardware and its thermal/heat dissipation issues.
Power consumption is a critical factor for end-node and edge devices. Some of these devices can be battery powered, but that brings strong constraints in consumption. For many business-to-business IOT solutions, the key factor is the thermal design power (TDP) of the chip. Improvement in manufacturing processing means many companies can deliver a powerful chip within small dimensions. Although this allows for cost reduction and simple integration in small appliances, it introduces a problem of dissipating the heat over its very limited area. It’s important to consider this factor in the selection of the chip and in the design of the complete system.
7. Lifecycle
Key Takeaway: Be sure to check the lifecycle of all components in the AI processing unit, especially if some come from third-party manufacturers. You may need to reach out to those manufacturers individually to confirm future availability.
In the B2B market, it is extremely important to select components with a guaranteed lifecycle of 3–5 years. If the AI processing unit is integrated with third party components such DDR/LPDDR, we suggest companies to get confirmation from the original memory manufacturer of production availability.
8. Suppliers
Key Takeaway: Like in most businesses, it can be time-consuming and expensive to change suppliers. Be sure you are comfortable with your choice of AI processor provider before aligning.
Once you’ve selected a processor, switching from a supplier to another can be a tough task. The reasons to consider a new supplier are, typically, a significant price advantage and/or significant performance advantage. These benefits, however, are only valid if the transition is smooth and there is no obstacle that, when integrating your hardware and software, could present a problem to your product or process. The available tools, support, real performance, marginal costs and total cost of ownership should be seriously considered before selecting a supplier, as they may be negatively impacted if you change suppliers.
9. Cost
Key Takeaway: Consider the total cost of ownership, “real” cost of selecting your hardware — and be sure you are not overlooking “hidden” cost items that will present themselves later on.
Cost is always one of the most important parameters in selecting hardware. We recommend you evaluate the cost of the processing unit in relation to the other metrics we mention in this article. You should consider the “real cost” at each level of throughput/performance. In general, we suggest you consider the overall total cost of ownership of this hardware, with particular attention paid to the costs of software integration and neural network porting. This cost is often hidden to the purchasing manager, but very visible to the R&D and product management departments.
All of these metrics are crucial to consider when making your AI hardware decision. If you select the right processor, your product can perform better and create market-capturing value for you and your customers.
On our side, we will continue to evaluate and issue insights about the AI hardware market — and will be sharing more shortly about our own core AI products and cloud-to-edge solutions. To stay updated, sign up for our newsletter at https://bitfury.com/ai.
Citations and Further Reading:
*HIS Markit, Cisco, OpenAI, Barclay MKT research, McKinsey, Gartner, Bitfury analysis
“AI: Built To Scale, from experimental to exponential”, Awalegaonkar, Berkey, Douglass, Reilly , 2019, Accenture. https://ai-zurich.ch/wp-content/uploads/2019/12/Accenture-Built-to-Scale-PDF-Report.pdf
Fabrizio Del Maffeo
Head of Artificial Intelligence, Bitfury