Edge AI and Vision Insights: December 18, 2024

 LETTER FROM THE EDITOR

Dear Colleague,

On Tuesday, January 28, 2025 at 9 am PT, TechInsights will present the free webinar “Sensing In ADAS and Autonomous Vehicles: What’s Winning, and Why?” in partnership with the Edge AI and Vision Alliance. It’s clear that the number of sensors per vehicle—and the sophistication of these sensors—is growing rapidly, largely thanks to increased adoption of advanced safety and driver assistance features. In this webinar, Ian Riches, Vice President of the Global Automotive Practice at TechInsights, will explore likely future demand for automotive radars, cameras and LiDARs.

Riches will examine which advanced driver assistance (ADAS) features will drive demand beyond 2030, how vehicle architecture change is impacting the market and what sorts of compute platforms these sensors will connect to. He will also share his firm’s vision of the autonomous vehicle landscape, considering scenarios for various automated driving levels and the resulting sensor demand. A question-and-answer session will follow the presentation. For more information and to register, please see the event page.

The Alliance will be taking a holiday break the next two weeks. Until next time, on behalf of the Alliance, I wish you joy, health, and happiness for the holiday season, and for the New Year. Happy Holidays!

Brian Dipert
Editor-In-Chief, Edge AI and Vision Alliance

MULTIMODAL LLMs

Unveiling the Power of Multimodal Large Language Models: Revolutionizing Perceptual AI

Multimodal large language models represent a transformative breakthrough in artificial intelligence, blending the power of natural language processing with visual understanding. In this talk, István Fehérvári, Director of Data and ML at BenchSci, delves into the essence of these models. He begins by explaining how large language models (LLMs) work at a fundamental level. He then explores how LLMs have evolved to integrate visual understanding, explains how they bridge the language and vision domains and shows how they are trained. Next, Fehérvári examines the current landscape of multimodal LLMs, including open solutions like LLaVA and BLIP. Finally, he explores applications that will be enabled by deploying these large models at the edge, identifies the key challenges that must be overcome to enable this and highlights what is needed to overcome these challenges.

Also see Fehérvári and colleagues’ presentations in the recording of the recent symposium “Your Next Computer Vision Model Might be an LLM: Generative AI and the Move From Large Language Models to Vision Language Models“, sponsored by the Edge AI and Vision Alliance.

Bridging Vision and Language: Designing, Training and Deploying Multimodal Large Language Models

In this presentation, Adel Ahmadyan, Staff Engineer at Meta Reality Labs, explores the use of multimodal large language models (MLLMs) in real-world edge applications. He begins by explaining how MLLMs (also known as large multimodal models, or LMMs) work and highlighting their key components, giving special attention to how MLLMs merge understanding in the vision and language domains. Next, Ahmadyan discusses the process of training MLLMs and the types of data needed to tune them for specific tasks. Finally, he highlights some of the key challenges in deploying MLLMs in resource-constrained edge devices and shares techniques for overcoming these challenges.

CLOUD-TO-EDGE MIGRATIONS

Adventures in Moving a Computer Vision Solution from Cloud to Edge

Optix is a computer vision-based AI system that measures advertising and media exposures on mobile devices for real-time marketing optimization. Optix was initially developed as a cloud-based solution, but costs and limitations associated with relying entirely on the cloud drove MetaConsumer to implement an edge solution. In this talk, Nate D’Amico, CTO and Head of Product at MetaConsumer, introduces his company’s application, the role that computer vision plays in it and the challenges of deploying it at scale. He shares the lessons learned from operating a cloud-based solution at scale, the trade-offs that drove MetaConsumer to create an edge-based solution and the hardware and software challenges faced in implementing it.

Using AI to Enhance the Well-being of the Elderly

This presentation from Harro Stokman, CEO of Kepler Vision Technologies, provides insights into an innovative application of artificial intelligence and advanced computer vision technologies in the healthcare sector, specifically focused on nursing homes, hospitals and remote patient monitoring. Stokman explores two challenges the company dealt with in developing its product: First, moving the vision application from 100% cloud-based to on-premise and then finally to embedded on the c71 MOBOTIX camera. And, second, collecting and processing visual patient data to train neural networks.

UPCOMING INDUSTRY EVENTS

Sensing In ADAS and Autonomous Vehicles: What’s Winning, and Why? – TechInsights Webinar: January 28, 2025, 9:00 am PT

Embedded Vision Summit: May 20-22, 2025, Santa Clara, California

More Events

FEATURED NEWS

Microchip Technology Expands Its PolarFire FPGA and SoC Solution Stacks with New Offerings for Medical Imaging and Smart Robotics

STMicroelectronics Boosts AI at the Edge with Its New NPU-accelerated STM32 Microcontrollers

Qualcomm Unveils the Snapdragon 8 Elite Mobile CPU

Lattice Semiconductor Advances Its Low Power FPGAs with New Small and Mid-range Products

The MIPI Alliance Announces OEM, Expanded Ecosystem Support for the MIPI A-PHY Automotive SerDes Specification

More News

EDGE AI AND VISION PRODUCT OF THE YEAR WINNER SHOWCASE

Ambarella Central 4D Imaging Radar Architecture (Best Edge AI Software or Algorithm)

Ambarella’s central 4D imaging radar architecture is the 2024 Edge AI and Vision Product of the Year Award Winner in the Edge AI Software and Algorithms category. It is the first centralized 4D imaging radar architecture that allows both central processing of raw radar data and deep low-level fusion with other sensor inputs—including cameras, lidar and ultrasonics. The central 4D imaging radar architecture combines Ambarella’s highly efficient 5 nm CV3-AD AI central domain controller system-on-chip (SoC) and the company’s Oculii adaptive AI radar software. This architecture’s optimized hardware and software provides the industry’s best AI processing performance per watt, for the lowest possible energy consumption, along with the most accurate and comprehensive AI modeling of a vehicle or robot’s surroundings. Ambarella’s Oculii AI radar algorithms uniquely adapt radar waveforms to the environment, achieving high angular resolution (0.5 degrees), an ultra-dense point cloud (10s of thousands of points/frame), and a long 500+ meters detection range, while using an order-of-magnitude fewer antennas for reduced data bandwidth and power consumption versus competing 4D imaging radars. Likewise, this architecture enables processor-less edge radar heads, further reducing both upfront costs and post-accident expenses (most radar modules are located behind the vehicle’s bumpers).

Please see here for more information on Ambarella’s central 4D imaging radar architecture. The Edge AI and Vision Product of the Year Awards celebrate the innovation of the industry’s leading companies that are developing and enabling the next generation of edge AI and computer vision products. Winning a Product of the Year award recognizes a company’s leadership in edge AI and computer vision as evaluated by independent industry experts.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top