This blog post was originally published at Renesas’ website. It is reprinted here with the permission of Renesas.
Vision AI – or computer vision – refers to technology that allows systems to sense and interpret visual data and make autonomous decisions based on an analysis of this data. These systems typically have camera sensors for acquisition of visual data that is provided as input activation to a neural network trained on large image datasets to recognize images. Vision AI can enable many applications like industrial machine vision for fault detection, autonomous vehicles, face recognition in security applications, image classification, object detection and tracking, medical imaging, traffic management, road condition monitoring, customer heatmap generation and so many others.
In my previous blog, Power Your Edge AI Application with the Industry’s Most Powerful Arm MCUs, I discussed some of the key performance advantages of the powerful RA8 Series MCUs with the Cortex-M85 core and Helium that make them ideally suited for voice and vision AI applications. As discussed there, the availability of higher performance MCUs as well as thin neural network models more suited for the resource constrained MCUs used in end point devices, are enabling these sorts of edge AI applications.
In this blog, I will discuss a vision AI application built on the new RA8D1 graphics-enabled MCUs featuring the same Cortex-M85 core and use of Helium to accelerate the neural network. RA8D1 MCUs provide a unique combination of advanced graphics capabilities, sensor interfaces, large memory and the powerful Cortex-M85 core with Helium for acceleration of the vision AI neural networks, making them ideally suited for these vision AI applications.
Graphics and Vision AI Applications with RA8D1 MCUs
Renesas has successfully demonstrated the performance uplift with Helium, in various AI / ML use cases showing significant improvement over a Cortex-M7 MCU – more than 3.6x in some cases.
One such use case is a people detection application developed in collaboration with Plumerai, a leading provider of vision AI solutions. This camera-based AI solution has been ported and optimized for the Helium-enabled Arm Cortex-M85 core, successfully demonstrating both the performance as well as the graphics capabilities of the RA8D1 devices.
Accelerated with Helium, the application achieves a 3.6x performance uplift vs. Cortex-M7 core and 13.6 fps frame rate, a strong performance for an MCU without hardware acceleration. The demo platform captures live images from an OV7740 image-sensor-based camera at 640×480 resolution and presents detection results on an attached 800×480 LCD display. The software detects and tracks each person within the camera frame, even if partially occluded, and shows bounding boxes drawn around each detected person overlaid on the live camera display.
Figure 1: Renesas People Detection AI Demo Platform, showcased at Embedded World 2023
Plumerai people detection software uses a convolution neural network with multiple layers, trained with over 32 million labeled images. The layers that account for the majority of the total latency, are Helium accelerated, such as the Conv2D and fully connected layers, as well as depthwise convolution and transpose convolution layers.
The camera module provides images in YUV422 format which is converted to RGB565 format for display on the LCD screen. The 2D graphics engine integrated on the RA8D1 resizes and converts the RGB565 to ABGR8888 at resolution 256×192 for input to the neural network. The software then converts the ARBG8888 format to the neural network model input format and runs the people detection inference function. The graphics LCD controller and 2D drawing engine on the RA8D1 are used to render the camera input to the LCD screen as well as draw bounding boxes around detected people and present the frame rate. The people detection software uses roughly 1.2MB of flash and 320KB of SRAM, including the memory for the 256×192 ABGR8888 input image.
Figure 2: People Detection AI application on the RA8D1 MCU
Benchmarking was done to compare the latency of Plumerai’s people detection solution as well as the same neural network running with TFMicro using Arm’s CMSIS-NN kernels. Additionally, for the Cortex-M85, the performance of both solutions with Helium (MVE) disabled was also benchmarked. This benchmark data shows pure inference performance and does not include latency for the graphics functions, such as image format conversions.
Figure 3: The Renesas people detection demo based on the RA8D1 demonstrates a performance uplift of 3.6x over the Cortex-M7 core
Figure 4: Inference performance of 13.6 fps @ 480 MHz using RA8D1 with Helium enabled
This application makes optimal use of all the resources available on the RA8D1:
- High-performance 480 MHz processor
- Helium for neural network acceleration
- Large flash and SRAM for storage of model weights and input activations
- Camera interface for capture of input images/video
- Display interface to show the people detection results
Renesas has also demonstrated multi-modal voice and vision AI solutions based on the RA8D1 devices that integrate visual wake words and face detection and recognition with speaker identification. RA8D1 MCUs with Helium can significantly improve neural network performance without the need for any additional hardware acceleration, thus providing a low-cost, low-power option for implementing AI and machine learning use cases.
References and Resources
- RA8D1 product page for technical documentation and samples
- EK-RA8D1 Evaluation Kit and example projects
- Plumerai people detection demo on Renesas RA8D1 MCUs
- Plumerai people detection solution
Kavita Char
Principal Product Marketing Manager, Renesas