LETTER FROM THE EDITOR |
Dear Colleague, I’m pleased to announce the second 2025 Embedded Vision Summit keynote speaker: Gérard Medioni, Vice President and Distinguished Scientist, Amazon Prime Video and Studios. In his address, Medioni will discuss his work on the innovative Just Walk Out technology as well as the Amazon One identity service. Moving to the world of entertainment, the session will also highlight the technology that powers Prime Video, including AI innovations that are improving the streaming experience for over 200 million Prime members worldwide. Attendees will gain insights into how these technologies are reshaping entertainment and will evolve in the coming years to enhance viewer engagement, storytelling and personalization. See here for more details on Medioni’s keynote session. And, to get a flavor for the types of talks you’ll hear at the 2025 Embedded Vision Summit, check out the videos of the 2024 Summit presentations highlighted below. Then register for the Summit, taking place May 20-22 in Santa Clara, California. We’ll see you there! Brian Dipert |
DEEP LEARNING MODEL TRAINING |
Diagnosing Problems and Implementing Solutions for Deep Neural Network Training In this 2024 Embedded Vision Summit talk, Fahed Hassanat, COO and Head of Engineering at Sensor Cortek, delves into some of the most common problems that arise when training deep neural networks. He provides a brief overview of essential training metrics, including accuracy, precision, false positives, false negatives and F1 score. Hassanat then explores training challenges that arise from problems with hyperparameters, inappropriately sized models, inadequate models, poor-quality datasets, imbalances within training datasets and mismatches between training and testing datasets. To help detect and diagnose training problems, he covers techniques such as understanding performance curves, recognizing overfitting and underfitting, analyzing confusion matrices and identifying class interaction issues. (Want to learn more about DNN training techniques? Attend the 2025 Embedded Vision Summit! Check out the program here.) |
Improved Data Sampling Techniques for Training Neural Networks For classification problems in which there are equal numbers of samples in each class, AI engineer Karthik Rao Aroor proposes a novel mini-batch sampling approach to train neural networks using gradient descent in this 2024 Embedded Vision Summit presentation. His proposed approach ensures a uniform distribution of samples from all classes in a mini-batch. He shares results showing that this approach yields faster convergence than the random sampling approach commonly used today. Aroor illustrates his approach using several neural network models trained on commonly used datasets, including a truncated version of ImageNet. He also presents results for large and small mini-batch sizes relative to the number of classes. Comparing these results to a suboptimal sampling approach, he hypothesizes that having a uniform distribution of samples from each class in a mini-batch is an optimal sampling approach. His approach benefits model trainers by achieving higher model accuracy with reduced training time. (Interested in learning more about dataset optimization for DNN training? Check out the 2025 Embedded Vision Summit program here, and then register today!) |
NEURAL NETWORK EVOLUTION AND OPTIMIZATIONS |
Transformer Networks: How They Work and Why They Matter Transformer neural networks have revolutionized artificial intelligence by introducing an architecture built around self-attention mechanisms. This has enabled unprecedented advances in understanding sequential data, such as human languages, while also dramatically improving accuracy on non-sequential tasks like object detection. In this 2024 Embedded Vision Summit talk, Rakshit Agrawal, Co-Founder and CEO of Ryddle AI, explains the technical underpinnings of transformer architectures, with particular focus on self-attention mechanisms. He also explores how transformers have influenced the direction of AI research and industry innovation. Finally, he touches on ethical considerations and discusses how transformers are likely to evolve in the near future. (Want to learn more about how transformer networks are used? Check out the upcoming 2025 Embedded Vision Summit talk by Ramit Pahwa of Rivian!) |
Data-efficient and Generalizable: The Domain-specific Small Vision Model Revolution Large vision models (LVMs) trained on a large and diverse set of imagery are revitalizing computer vision, just as LLMs did for language modeling. However, LVMs are not nearly as effective when applied to unique types of imagery. To handle labeled data scarcity without overfitting, we need models that are tuned to a specific domain of imagery. Whether it’s a single medical imaging modality, multispectral drone photos or snapshots from a manufacturing line, these fine-grained applications are best captured with a model that can accommodate the available data. A small vision model with fewer parameters improves generalizability with the added bonus of better computational efficiency so that it can run on an edge device. In this 2024 Embedded Vision Summit presentation, Heather Couture, Founder and Computer Vision Consultant at Pixel Scientia Labs, shows why domain-specific models are essential and how they can be trained without labeled data. She concludes by demonstrating the efficacy of domain-specific models in handling small training sets, imbalanced data and distribution shifts for various types of imagery. (To learn more about efficient deep learning at the edge, attend the 2025 Embedded Vision Summit! Check out the program here.) |
UPCOMING INDUSTRY EVENTS |
Embedded Vision Summit: May 20-22, 2025, Santa Clara, California |
FEATURED NEWS |
OpenMV Unveils the N6 and AE3 High-performance, Low-power AI Vision Cameras for Makers and Professionals Renesas Extends Its Mid-Class AI Processor Line-Up with the RZ/V2N for Smart Factories and Intelligent Cities NVIDIA Announces an Open Physical AI Dataset to Advance Robotics and Autonomous Vehicle Development Allegro DVT Launches its First AI-based Neural Video Processing IP An Upcoming In-person Event from Andes Technology Explores the RISC-V Ecosystem |