LETTER FROM THE EDITOR |
Dear Colleague, We’re excited to announce the 2025 Embedded Vision Summit keynote: “The Future of Visual AI: Efficient Multimodal Intelligence,” presented by Trevor Darrell,Professor at the University of California, Berkeley. Professor Darrell will present groundbreaking research on the revolution in visual AI driven by breakthroughs like the use of large language models as visual reasoning coordinators that reason like humans and vision-language models (VLMs) that integrate natural language understanding with vision. Attendees will gain insights into how emerging techniques can be used to train vision models without labeled data and to enable robots to act in novel situations. Professor Darrell’s work addresses challenges such as memory and compute limitations, focusing on making VLMs more efficient while maintaining accuracy. His keynote presentation will also explore how multimodal AI and prompt-tuned reasoning enable consumers to utilize visual intelligence at home while preserving privacy. Check out the keynote abstract and Professor Darrell’s bio, peruse the available event pass options, and then register today for the Summit, taking place May 20-22 in Santa Clara, California, using discount code SUMMIT25-NL for 25% off. We look forward to seeing you there! Brian Dipert |
OVERCOMING VISUAL COMPROMISES |
Improved Navigation Assistance for the Blind via Real Time Edge AI In this 2024 Embedded Vision Summit talk, Aishwarya Jadhav presents recent work on AI Guide Dog, a groundbreaking research project aimed at providing navigation assistance for the blind community. This multi-year project at Carnegie Mellon University leverages AI to predict sighted human reactions in real time and convey this information audibly to blind individuals, overcoming the limitations of existing GPS apps and mobility tools for the blind. Jadhav discusses the various vision-only and multimodal models evaluated. She also discusses imitation learning approaches currently being explored. In addition, she highlights trade-offs among the strict requirements for models to ensure explainable predictions, high accuracy and real-time processing on mobile devices. And she shares insights gained through three iterations of this project, explaining data collection procedures, training pipelines and cutting-edge vision and multimodal modeling methodologies. She concludes with some exciting results. |
Removing Weather-related Image Degradation at the Edge For machines that operate outdoors—such as autonomous cars and trucks—image quality degradation due to weather conditions presents a significant challenge. For example, snow, rainfall and raindrops on optical surfaces can wreak havoc on machine perception algorithms. In this 2024 Embedded Vision Summit presentation, Ramit Pahwa, Machine Learning Scientist at Rivian, explains the key challenges in restoring images degraded by weather, such as lack of annotated datasets, and the need for multiple models to address different types of image degradation. Pahwa also introduces metrics for assessing image degradation. He then explains Rivian’s solutions and shares results, demonstrating the efficacy of transformer-based models and of a novel, language-driven, all-in-one model for image restoration. Finally, he highlights the techniques used to create efficient implementations of Rivian’s models for deployment at the edge—including quantization and pruning—and shares lessons learned from implementing these models on a target processor. |
ITERATIVE DEEP LEARNING MODEL TRAINING |
Federated ML Architecture for Computer Vision in the IoT Edge In this 2024 Embedded Vision Summit talk, Akram Sheriff, Senior Manager for Software Engineering at Cisco, begins by introducing federated learning (FL) for computer vision in IoT edge applications. Federated learning is an approach to machine learning that enables collaborative training of deployed models while maintaining decentralized data. Sheriff surveys a variety of existing FL architectures and highlights the challenges associated with them, such as statistical dataset issues and system complexities. He then describes a novel FL approach that addresses and solves these challenges for computer vision and IoT edge applications. He shares results comparing this novel approach with existing approaches and highlighting its advantages and limitations. He also shows examples of real-world applications where federated learning is used for data privacy reasons, such as in healthcare. You’ll gain insights into leveraging FL for efficient and privacy-preserving model training in IoT-based computer vision systems. |
Continual Learning Through Sequential, Lightweight Optimization In this 2024 Embedded Vision Summit presentation, Guy Lavi, Managing Partner at Vision Elements, shows how techniques of sequential optimization are applied to enable continual learning during run-time, as new observations flow in. The lightweight nature of these techniques, using only the new batches of observations for processing, allows for new training iterations to be performed on the edge without losing memory of the entire pool of observations used for the initial training. Lavi presents detailed examples using this technique, showing how it can be used to optimize a linear function, an image warping algorithm and an object classification neural network. |
UPCOMING INDUSTRY EVENTS |
Embedded Vision Summit: May 20-22, 2025, Santa Clara, California |
FEATURED NEWS |
Qualcomm Launches an On-premises AI Appliance Solution and Inference Suite for Diverse Vertical Markets NVIDIA’s Blackwell GeForce RTX 50 Series Enhances AI Computer Vision and Graphics STMicroelectronics’ Upgraded Sensor Board Accelerates Plug-and-play Evaluations NAMUGA Unveils Next-generation Optical Beamforming-based 3D Sensing Solutions Vision Components Introduces Advanced MIPI Camera Modules with Integrated Image Pre-processing |