LLMs and MLLMs
The past decade-plus has seen incredible progress in practical computer vision. Thanks to deep learning, computer vision is dramatically more robust and accessible, and has enabled compelling capabilities in thousands of applications, from automotive safety to healthcare. But today’s widely used deep learning techniques suffer from serious limitations. Often, they struggle when confronted with ambiguity (e.g., are those people fighting or dancing?) or with challenging imaging conditions (e.g., is that shadow in the fog a person or a shrub?). And, for many product developers, computer vision remains out of reach due to the cost and complexity of obtaining the necessary training data, or due to lack of necessary technical skills.
Recent advances in large language models (LLMs) and their variants such as vision language models (VLMs, which comprehend both images and text), hold the key to overcoming these challenges. VLMs are an example of multimodal large language models (MLLMs), which integrate multiple data modalities such as language, images, audio, and video to enable complex cross-modal understanding and generation tasks. MLLMs represent a significant evolution in AI by combining the capabilities of LLMs with multimodal processing to handle diverse inputs and outputs.
The purpose of this portal is to facilitate awareness of, and education regarding, the challenges and opportunities in using LLMs, VLMs, and other types of MLLMs in practical applications — especially applications involving edge AI and machine perception. The content that follows (which is updated regularly) discusses these topics. As a starting point, we encourage you to watch the recording of the symposium “Your Next Computer Vision Model Might be an LLM: Generative AI and the Move From Large Language Models to Vision Language Models“, sponsored by the Edge AI and Vision Alliance. A preview video of the symposium introduction by Jeff Bier, Founder of the Alliance, follows:
If there are topics related to LLMs, VLMs or other types of MLLMs that you’d like to learn about and don’t find covered below, please email us at [email protected] and we’ll consider adding content on these topics in the future.
View all LLM and MLLM Content
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/02/669a788d673dd368057cab77_RAG-for-Vision-8-p-1600-300x169.png)
RAG for Vision: Building Multimodal Computer Vision Systems
This blog post was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. This article explores the exciting world of Visual RAG, exploring its significance and how it’s revolutionizing traditional computer vision pipelines. From understanding the basics of RAG to its specific applications in visual tasks and surveillance, we’ll examine
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/AI_in_business-ea7b7e7d-300x176.jpeg)
The Future of AI in Business: Trends to Watch
This blog post was originally published at Digica’s website. It is reprinted here with the permission of Digica. In a world increasingly shaped by the rapid evolution of artificial intelligence, 2024 stands as another momentous year, with advancements that continue to reshape how we live, work, and imagine our future. From the rapid acceleration in
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/66830dff7cfe65cfe08e01eb_2022-2-p-1600-300x169.png)
Multimodal Large Language Models: Transforming Computer Vision
This blog post was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. This article introduces multimodal large language models (MLLMs) [1], their applications using challenging prompts, and the top models reshaping computer vision as we speak. What is a multimodal large language model (MLLM)? In layman terms, a multimodal
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/support_1-8c346948-300x175.jpeg)
Harnessing the Power of LLM Models on Arm CPUs for Edge Devices
This blog post was originally published at Digica’s website. It is reprinted here with the permission of Digica. In recent years, the field of machine learning has witnessed significant advancements, particularly with the development of Large Language Models (LLMs) and image generation models. Traditionally, these models have relied on powerful cloud-based infrastructures to deliver impressive
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/AI-on-the-road-why-ai-powered-cars-are-the-future-300x169.jpg)
AI On the Road: Why AI-powered Cars are the Future
This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm. AI transforms your driving experience in unexpected ways as showcased by Qualcomm Technologies collaborations As automotive technology rapidly advances, consumers are looking for vehicles that deliver AI-enhanced experiences through conversational voice assistants and sophisticated user interfaces. Automotive
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/edge-intelligence-and-interoperability-are-key-to-smart-home-300x200.jpg)
Edge Intelligence and Interoperability are the Key Components Driving the Next Chapter of the Smart Home
This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm. The smart home industry is on the brink of a significant leap forward, fueled by generative AI and edge capabilities The smart home is evolving to include advanced capabilities, such as digital assistants that interact like friends
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/video-models-nemo-framework-featured-300x169.png)
Accelerate Custom Video Foundation Model Pipelines with New NVIDIA NeMo Framework Capabilities
This blog post was originally published at NVIDIA’s website. It is reprinted here with the permission of NVIDIA. Generative AI has evolved from text-based models to multimodal models, with a recent expansion into video, opening up new potential uses across various industries. Video models can create new experiences for users or simulate scenarios for training
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/how-ai-on-the-edge-fuels-7-tech-trends-2025-300x169.jpg)
How AI On the Edge Fuels the 7 Biggest Consumer Tech Trends of 2025
This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm. From more on-device AI features on your phone to the future of cars, 2025 is shaping up to be a big year Over the last two years, generative AI (GenAI) has shaken up, well, everything. Heading into
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/omniverse-generative-physical-ai-300x169.png)
NVIDIA Expands Omniverse With Generative Physical AI
New Models, Including Cosmos World Foundation Models, and Omniverse Mega Factory and Robotic Digital Twin Blueprint Lay the Foundation for Industrial AI Leading Developers Accenture, Altair, Ansys, Cadence, Microsoft and Siemens Among First to Adopt Platform Libraries January 6, 2025 — CES — NVIDIA today announced generative AI models and blueprints that expand NVIDIA Omniverse™
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/NVIDIA-Cosmos-300x169.jpg)
NVIDIA Launches Cosmos World Foundation Model Platform to Accelerate Physical AI Development
New State-of-the-Art Models, Video Tokenizers and an Accelerated Data Processing Pipeline, Optimized for NVIDIA Data Center GPUs, Are Purpose-Built for Developing Robots and Autonomous Vehicles First Wave of Open Models Available Now to Developer Community Global Physical AI Leaders 1X, Agile Robots, Agility, Figure AI, Foretellix, Uber, Waabi and XPENG Among First to Adopt January 6, 2025 —
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/rtx-ai-pcs-300x169.png)
NVIDIA Launches AI Foundation Models for RTX AI PCs
NVIDIA NIM Microservices and AI Blueprints Help Developers and Enthusiasts Build AI Agents and Creative Workflows on PC January 6, 2025 — CES — NVIDIA today announced foundation models running locally on NVIDIA RTX™ AI PCs that supercharge digital humans, content creation, productivity and development. These models — offered as NVIDIA NIM™ microservices — are accelerated by
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/ilo-days-ws-05-300x129.jpg)
Optimizing Multimodal AI Inference
This blog post was originally published at Intel’s website. It is reprinted here with the permission of Intel. Multimodal models are becoming essential for AI, enabling the integration of diverse data types into a single model for richer insights. During the second Intel® Liftoff Days 2024 hackathon, Rahul Nair’s workshop on Inference of Multimodal Models
![](https://www.edge-ai-vision.com/wp-content/uploads/2025/01/OnPremAIApplianceSolution-217x300.jpg)
Qualcomm Launches On-prem AI Appliance Solution and Inference Suite to Step-up AI Inference Privacy, Flexibility and Cost Savings Across Enterprise and Industrial Verticals
Highlights: Qualcomm AI On-Prem Appliance Solution is designed for generative AI inference and computer vision workloads on dedicated on-premises hardware – allowing sensitive customer data, fine-tuned models, and inference loads to remain on premises. Qualcomm AI Inference Suite provides ready-to-use AI applications and agents, tools and libraries for operationalizing AI from computer vision to generative
![](https://www.edge-ai-vision.com/wp-content/uploads/2024/12/why-generative-AI-is-the-catalyst-that-mixed-reality-needs-300x169.jpg)
Why Generative AI is the Catalyst That Mixed Reality Needs
This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm. From content creation to digital avatars, generative AI is the critical ingredient for building immersive worlds in mixed reality The promise of mixed reality fundamentally changing the way we interact and live our lives has always been
![](https://www.edge-ai-vision.com/wp-content/uploads/2024/12/recap-2024-nv-blog-1280x680-1-300x159.jpg)
From Generative to Agentic AI, Wrapping the Year’s AI Advancements
This blog post was originally published at NVIDIA’s website. It is reprinted here with the permission of NVIDIA. Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for GeForce RTX PC and NVIDIA RTX workstation users.
![](https://www.edge-ai-vision.com/wp-content/uploads/2024/12/37ZyhwgYqCY-300x169.jpg)
Qualcomm CEO Cristiano Amon at Web Summit: GenAI is the New UI
This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm. How generative AI (GenAI)-powered “agents” will change the way you interact with the digital world The rise of artificial intelligence (AI) opens the door to a vast array of possibilities. AI-powered agents will be the key to