Praveen Nayak, Tech Lead at PathPartner Technology, presents the "Using Deep Learning for Video Event Detection on a Compute Budget" tutorial at the May 2019 Embedded Vision Summit.
Convolutional neural networks (CNNs) have made tremendous strides in object detection and recognition in recent years. However, extending the CNN approach to understanding of video or volumetric data poses tough challenges, including trade-offs between representation quality and computational complexity, which is of particular concern on embedded platforms with tight computational budgets. This presentation explores the use of CNNs for video understanding.
Nayak reviews the evolution of deep representation learning methods involving spatio- temporal fusion from C3D to Conv-LSTMs for vision-based human activity detection. He proposes a decoupled alternative to this fusion, describing an approach that combines a low-complexity predictive temporal segment proposal model and a fine-grained (perhaps high- complexity) inference model. PathPartner Technology finds that this hybrid approach, in addition to reducing computational load with minimal loss of accuracy, enables effective solutions to these high complexity inference tasks.