Edge AI and Vision Insights: September 11, 2024

LETTER FROM THE EDITOR

Dear Colleague,

In a field as rapidly evolving as computer vision and edge AI, the exchange of diverse perspectives is crucial. That’s why, for the first time, we’re delighted to offer a limited number of free guest seats to the quarterly in-person Edge AI and Vision Forum for qualified individuals. It’s coming up on Wednesday, September 18, in Silicon Valley, and I’m excited to announce our invited speakers:

Michael Giannangeli from Amazon on using computer vision to enhance Echo smart home devices
Xueyan Zou from UC San Diego on combining language and vision for more versatile AI models
Chris Padwick from Blue River Technology on John Deere’s use of computer vision to improve farming efficiency
David Selinger, co-founder and CEO at Deep Sentinel, on innovation bottlenecks in edge AI applications
Jacob Mullins, Managing Director at Shasta Ventures, on conditions, trends, opportunities and challenges for start-ups in the edge AI/CV space

This is a unique opportunity to make useful connections and gain early insights into computer vision and perceptual AI technology and market trends. Interested in attending? Submit your interest here. The Forum is intended for product and business leaders in the edge AI and computer vision ecosystem. Seats are limited. The Alliance will determine eligibility based on factors such as job function and industry.

Brian Dipert
Editor-In-Chief, Edge AI and Vision Alliance

MULTIMODAL PERCEPTION

Understand the Multimodal World with Minimal Supervision

The field of computer vision is undergoing another profound change. Recently, “generalist” models have emerged that can solve a variety of visual perception tasks. Also known as foundation models, they are trained on huge internet-scale unlabeled or weakly labeled data and can adapt to new tasks without any additional supervision or with just a small number of manually labeled samples. Moreover, some are multimodal: they understand both language and images and can support other perceptual modes as well. In this 2024 Embedded Vision Summit keynote presentation, Yong Jae Lee, Associate Professor in the Department of Computer Sciences at the University of Wisconsin-Madison and CEO of GivernyAI, shares recent groundbreaking research on creating intelligent systems that can learn to understand our multimodal world with minimal human supervision.

Lee focuses on systems that can understand images and text, and also touches upon those that utilize video, audio and LiDAR. Since training foundation models from scratch can be prohibitively expensive, he discusses how to efficiently repurpose existing foundation models for use in application-specific tasks. Lee also discusses how these models can be used for image generation and, in turn, for detecting AI-generated images. He concludes by highlighting key remaining challenges and promising research directions. You will learn how emerging techniques address today’s neural network training bottlenecks, facilitate new types of multimodal machine perception and enable countless new applications.

Multimodal LLMs at the Edge: Are We There Yet?

Large language models (LLMs) are fueling a revolution in AI. And, while chatbots are the most visible manifestation of LLMs, the use of multimodal LLMs for visual perception—for example, vision language models like LLaVA that are capable of understanding both text and images—may ultimately have greater impact given that so many AI use cases require an understanding of both language concepts and visual data, versus language alone. To what extent—and how quickly—will multimodal LLMs change how we do computer vision and other types of machine perception? Are they needed for real-world applications, or are they a solution looking for a problem? If they are needed, are they needed at the edge? What will be the main challenges in running them there? Is it the nature of the computation, the amount of computation, memory bandwidth, ease of development or some other factor? Is today’s edge hardware up to the task? If not, what will it take to get there?

This lively and insightful expert panel discussion from the 2024 Embedded Vision Summit answers these and many other questions around the rapidly evolving role of multimodal LLMs in machine perception applications. It is moderated by Sally Ward-Foxton, Senior Reporter at EE Times; other panelists include Adel Ahmadyan, Staff Engineer at Meta Reality Labs, Jilei Hou, Vice President of Engineering and Head of AI Research at Qualcomm Technologies, Pete Warden, CEO of Useful Sensors, and Yong Jae Lee, Associate Professor in the Department of Computer Sciences at the University of Wisconsin-Madison and CEO of GivernyAI. The panelists have firsthand experience with these models and the challenges associated with implementing them at the edge.

INDUSTRIAL APPLICATIONS

Recent Trends in Industrial Machine Vision: Challenging Times

For decades, cameras have been increasingly used in industrial applications as key components for automation. After two years of rapid growth in 2021 and 2022, the industrial machine vision market paused in 2023, affected by the global economic downturn which translated into reduced factory investments and production rates. At the same time, a strong Chinese machine vision supplier ecosystem has emerged in recent years, establishing itself among the market leaders. In this presentation, Axel Clouet, Technology and Market Analyst for Imaging at the Yole Group, explains how the industrial machine vision ecosystem is adapting to this new reality, both from a supply chain and from a technology perspective.

Operationalizing AI in the Manufacturing Sector

AI at the edge is powering a revolution in industrial IoT, from real-time processing and analytics that drive greater efficiency and learning to predictive maintenance. Intel is focused on developing tools and assets to help domain experts operationalize AI-based solutions in their fields of expertise. In this talk, Tara Thimmanaik, AI Systems and Solutions Architect at Intel, explains how her company’s software platforms simplify labor-intensive data upload, labeling, training, model optimization and retraining tasks. She shows how domain experts can quickly build vision models for a wide range of processes—detecting defective parts on a production line, reducing downtime on the factory floor, automating inventory management and other digitization and automation projects. And she introduces Intel-provided edge computing assets that empower faster localized insights and decisions, improving labor productivity through easy-to-use AI tools that democratize AI.

UPCOMING INDUSTRY EVENTS

Leveraging Synthetic Data for Real-time Visual Human Behavior Analysis Using the SKY ENGINE AI Platform – SKY ENGINE AI Webinar: September 26, 2024, 9:00 am PT

Delivering High Performance, Low Power Complete Edge-AI Applications with the SiMa.ai One Platform MLSoC and Toolset – SiMa.ai Webinar: October 17, 2024, 9:00 am PT

Your Next Computer Vision Model Might be an LLM: Generative AI and the Move From Large Language Models to Vision Language Models – Edge AI and Vision Alliance Online Symposium: October 23, 2024, 9:00 am PT

Embedded Vision Summit: May 20-22, 2025, Santa Clara, California

More Events

EDGE AI AND VISION PRODUCT OF THE YEAR WINNER SHOWCASE

Qualcomm Snapdragon X Elite Platform (Best Edge AI Processor)

Qualcomm’s Snapdragon X Elite Platform is the 2024 Edge AI and Vision Product of the Year Award Winner in the Edge AI Processors category. The Snapdragon X Elite is the first Snapdragon based on the new Qualcomm Oryon CPU architecture, which outperforms every other laptop CPU in its class. The Snapdragon X Elite’s heterogeneous AI Engine has a combined performance of greater than 70 TOPS across the NPU, CPU and GPU. The Snapdragon X Elite includes a powerful integrated NPU capable of delivering up to 45 TOPS. In addition to raw performance, on-device AI benefits from a model’s accuracy and response time, as well as the speed for large language models, measured in tokens per second. The Snapdragon X Elite can run a 7 billion parameter Llama 2 model on-device at 30 tokens per second. The Oryon CPU subsystem outperforms the competitor’s high-end 14-core laptop chip in peak performance by 60%, and can match the competitor’s performance while using 65% less power. When compared to the leading performing x86 integrated GPU, Snapdragon X Elite delivers up to 80% faster performance, and can match the competitor’s highest performance with 80% less power consumption. Developers will have access to the latest AI SDKs too. Snapdragon X Elite features support for all of the leading AI frameworks, including TensorFlow, PyTorch, ONNX, Keras and more.

Please see here for more information on Qualcomm’s Snapdragon X Elite Platform. The Edge AI and Vision Product of the Year Awards celebrate the innovation of the industry’s leading companies that are developing and enabling the next generation of edge AI and computer vision products. Winning a Product of the Year award recognizes a company’s leadership in edge AI and computer vision as evaluated by independent industry experts.

If you're building AI or vision-enabled products, you've come to the right place.

Edge AI and Vision Insights: September 11, 2024

LETTER FROM THE EDITOR

MULTIMODAL PERCEPTION

INDUSTRIAL APPLICATIONS

UPCOMING INDUSTRY EVENTS

EDGE AI AND VISION PRODUCT OF THE YEAR WINNER SHOWCASE

Pages

Topics

Contact

Address

Phone