Gian Marco Iodice, Staff Compute Performance Software Engineer at Arm, presents the "Performance Analysis for Optimizing Embedded Deep Learning Inference Software" tutorial at the May 2019 Embedded Vision Summit.
Deep learning on embedded devices is currently enjoying significant success in a number of vision applications—particularly smartphones, where increasingly prevalent AI cameras are able to enhance every captured moment. However, the considerable number of deep learning network architectures proposed every year has led to real challenges for software developers who need to implement these demanding algorithms very efficiently.
In this presentation, Iodice presents a structured approach for performance analysis of deep learning software implementations. He examines the fundamentals of performance analysis for deep learning, presenting metrics and methodologies. He then shows how a top-down approach can be used to detect and fix performance bottlenecks, creating efficient deep neural network software implementations. He also illustrates typical software optimizations that can be used to make the best use of available computational resources.