Rustem Feyzkhanov, Staff Machine Learning Engineer at Instrumental, presents the “Survey of Model Compression Methods” tutorial at the May 2023 Embedded Vision Summit.
One of the main challenges when deploying computer vision models to the edge is optimizing the model for speed, memory and energy consumption. In this talk, Feyzkhanov provides a comprehensive survey of model compression approaches, which are crucial for harnessing the full potential of deep learning models on edge devices.
Feyzkhanov explores pruning, weight clustering and knowledge distillation, explaining how these techniques work and how to use them effectively. He also examines inference frameworks, including ONNX, TFLite and OpenVINO. Feyzkhanov discusses how these frameworks support model compression and explores the impact of hardware considerations on the choice of framework. He concludes with a comparison of the techniques presented, considering implementation complexity and typical efficiency gains.
See here for a PDF of the slides.