Josh Morris, Engineering Manager at DSP Concepts, presents the “Comparing ML-Based Audio with ML-Based Vision: An Introduction to ML Audio for ML Vision Engineers” tutorial at the May 2022 Embedded Vision Summit.
As embedded processors become more powerful, our ability to implement complex machine learning solutions at the edge is growing. Vision has led the way, solving problems as far-reaching as facial recognition and autonomous navigation. Now, ML audio is starting to appear in more and more edge applications, for example in the form of voice assistants, voice user interfaces and voice communication systems.
Although audio data is quite different from video and image data, ML audio solutions often use many of the same techniques initially developed for video and images. In this talk, Morris introduces the ML techniques commonly used for audio at the edge, and compares and contrasts them with those commonly used for vision. You’ll get inspired to integrate ML-based audio into your next solution.
See here for a PDF of the slides.