Moritz August, CDO at Nomitri GmbH, presents the “Deploying PyTorch Models for Real-time Inference On the Edge” tutorial at the May 2021 Embedded Vision Summit.
In this presentation, August provides an overview of workflows for deploying compressed deep learning models, starting with PyTorch and creating native C++ application code running in real-time on embedded hardware platforms. He illustrates these workflows on smartphones with real-world examples targeting ARM-based CPU, GPUs, and NPUs as well as embedded chips and modules like the NXP i.MX8+ and NVIDIA Jetson Nano.
August examines TorchScript, architecture-side optimizations, quantization and common pitfalls. Additionally, he shows how the PyTorch deployment workflow can be extended to conversion to ONNX and quantization of ONNX models using an ONNX Runtime. On the application side, he demonstrates how deployed models can be integrated efficiently into a C++ library that runs natively on mobile and embedded devices and highlights known limitations.
See here for a PDF of the slides.