Multimodal Large Language Models

LLMs and MLLMs

The past decade-plus has seen incredible progress in practical computer vision. Thanks to deep learning, computer vision is dramatically more robust and accessible, and has enabled compelling capabilities in thousands of applications, from automotive safety to healthcare. But today’s widely used deep learning techniques suffer from serious limitations. Often, they struggle when confronted with ambiguity (e.g., are those people fighting or dancing?) or with challenging imaging conditions (e.g., is that shadow in the fog a person or a shrub?). And, for many product developers, computer vision remains out of reach due to the cost and complexity of obtaining the necessary training data, or due to lack of necessary technical skills.

Recent advances in large language models (LLMs) and their variants such as vision language models (VLMs, which comprehend both images and text), hold the key to overcoming these challenges. VLMs are an example of multimodal large language models (MLLMs), which integrate multiple data modalities such as language, images, audio, and video to enable complex cross-modal understanding and generation tasks. MLLMs represent a significant evolution in AI by combining the capabilities of LLMs with multimodal processing to handle diverse inputs and outputs.

The purpose of this portal is to facilitate awareness of, and education regarding, the challenges and opportunities in using LLMs, VLMs, and other types of MLLMs in practical applications — especially applications involving  edge AI and machine perception. The content that follows (which is updated regularly) discusses these topics. As a starting point, we encourage you to watch the recording of the symposium “Your Next Computer Vision Model Might be an LLM: Generative AI and the Move From Large Language Models to Vision Language Models“, sponsored by the Edge AI and Vision Alliance. A preview video of the symposium introduction by Jeff Bier, Founder of the Alliance, follows:


If there are topics related to LLMs, VLMs or other types of MLLMs that you’d like to learn about and don’t find covered below, please email us at [email protected] and we’ll consider adding content on these topics in the future.

View all LLM and MLLM Content

Technologies Driving Enhanced On-device Generative AI Experiences: LoRA

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm. Utilize low-rank adaptation (LoRA) to provide customized experiences across use cases Enhancing contextualization and customization has always been a driving force in the realm of user experience. While generative artificial intelligence (AI) has already demonstrated its transformative

Read More »

Technologies Driving Enhanced On-device Generative AI Experiences: Multimodal Generative AI

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm. Leverage additional modalities in generative AI models to enable necessary advancements for contextualization and customization across use cases A constant desire in user experience is improved contextualization and customization. For example, consumers want devices to automatically use

Read More »

Qualcomm AI Hub Expands to On-device AI Apps for Snapdragon-powered PCs

Highlights: Qualcomm AI Hub expands to support Snapdragon X Series Platforms, empowering developers to easily take advantage of the best-in-class CPU and the world’s fastest NPU for laptops, and create responsive and power-efficient on-device generative AI applications for next-gen Windows PCs. Developers can now optimize their own models using the Qualcomm AI Hub—adding flexibility and

Read More »

Ambarella’s Next-Gen AI SoCs for Fleet Dash Cams and Vehicle Gateways Enable Vision Language Models and Transformer Networks Without Fan Cooling

Two New 5nm SoCs Provide Industry-Leading AI Performance Per Watt, Uniquely Allowing Small Form Factor, Single Boxes With Vision Transformers and VLM Visual Analysis SANTA CLARA, Calif., May 21, 2024 — Ambarella, Inc. (NASDAQ: AMBA), an edge AI semiconductor company, today announced during AutoSens USA, the latest generation of its AI systems-on-chip (SoCs) for in-vehicle

Read More »

AiM Future Brings GenAI Applications to Mainstream Consumer Devices

Seoul, Korea, and San Jose, CA – May 15, 2024 – AiM Future, a leading provider of concurrent multimodal inference accelerators for edge and endpoint devices, has just announced the launch of its next-generation Generative AI Architecture, “GAIA,” and Synabro software development kit. These GAIA-based accelerators are designed to enable energy-efficient transformers and large language

Read More »

“Generative AI: How Will It Impact Edge Applications and Machine Perception?,” An Embedded Vision Summit Expert Panel Discussion

Sally Ward-Foxton, Senior Reporter at EE Times, moderates the “Generative AI: How Will It Impact Edge Applications and Machine Perception?” Expert Panel at the May 2023 Embedded Vision Summit. Other panelists include Greg Kostello, CTO and Co-Founder of Huma.AI, Vivek Pradeep, Partner Research Manager at Microsoft, Steve Teig, CEO of Perceive, and Roland Memisevic, Senior

Read More »

“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a Keynote Presentation from Kristen Grauman

Kristen Grauman, Professor at the University of Texas at Austin and Research Director at Facebook AI Research, presents the “Frontiers in Perceptual AI: First-person Video and Multimodal Perception” tutorial at the May 2023 Embedded Vision Summit. First-person or “egocentric” perception requires understanding the video and multimodal data that streams from wearable cameras and other sensors.

Read More »

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top