Check out the expert panel discussion “Multimodal LLMs at the Edge: Are We There Yet?” at the upcoming 2024 Embedded Vision Summit, taking place May 21-23 in Santa Clara, California! The Summit is the premier conference for innovators incorporating computer vision and edge AI in products. It attracts a global audience of technology professionals from companies developing computer vision and edge AI-enabled products including embedded systems, cloud solutions and mobile applications. Visit the Summit website for more information, and then register today! Don’t forget to also pass the word on to your colleagues. We look forward to seeing you there!
Large language models (LLMs) are fueling a revolution in AI. And, while chatbots are the most visible manifestation of LLMs, the use of multimodal LLMs for visual perception—for example, vision language models like LLaVA that are capable of understanding both text and images—may ultimately have greater impact given that so many AI use cases require an understanding of both language concepts and visual data, versus language alone. To what extent—and how quickly—will multimodal LLMs change how we do computer vision and other types of machine perception? Are they needed for real-world applications, or are they a solution looking for a problem? If they are needed, are they needed at the edge? What will be the main challenges in running them there? Is it the nature of the computation, the amount of computation, memory bandwidth, ease of development or some other factor? Is today’s edge hardware up to the task? If not, what will it take to get there? To answer these and many other questions around the rapidly evolving role of multimodal LLMs in machine perception applications at the edge, we’ve assembled an amazing set of panelists who have firsthand experience with these models and the challenges associated with implementing them at the edge. Join us for a lively and insightful discussion!
Sally Ward-Foxton covers AI technology and related issues for EETimes.com and all aspects of the European semiconductor industry for EETimes Europe magazine. Sally has spent more than 15 years writing about the electronics industry from London, UK. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more. She holds a Masters’ degree in Electrical and Electronic Engineering from the University of Cambridge.
Adel Ahmadyan is a Staff Engineer at Meta Reality Labs, where he is the Technical Lead for the development of multimodal systems and large vision-language models. Prior to joining Meta, Dr. Ahmadyan was a key contributor at Google, where he focused on on-device machine learning. His work at Google was instrumental in advancing the company’s on-device ML capabilities and enabled features across many products, including Google Pixel, Meet, Photos and YouTube. With over a decade of industry experience in computer vision, he has been constantly pushing the boundaries of real-time computer vision and live perception on-device and on-edge. Adel holds a PhD from the University of Illinois Urbana-Champaign. He also holds both a master’s and bachelor’s degree from Sharif University of Technology. Adel lives in San Francisco and spends most of his spare time in the Sierras.
Dr. Jilei Hou is a VP of Engineering and the Head of AI Research at Qualcomm Technologies. Jilei obtained his PhD from UCSD and joined Qualcomm in 2003. He made substantial contributions in technology innovation, standardization and product commercialization across wireless 3G/4G/5G standards. In 2011 he moved to Beijing and became the Head of Qualcomm Research China, where he initiated 5G research and intelligent robotics programs. In 2017 he moved back to San Diego to lead AI Research, developing the AI research infrastructure, driving technical innovations for next-gen hardware and software platforms and leading research to benefit technology verticals—e.g., mobile, auto and XR. Jilei built a world-class AI R & D team with technology leadership in the areas of power-efficient AI, on-device AI and GenAI. He is an IEEE Senior Member and participated in several Frontiers of Engineering (FOE) symposia organized by the US and Chinese National Academies of Engineering.
Pete Warden is a thinker, innovator and entrepreneur in AI, software and big data. In 2003, Pete created a set of image processing filters to detect features in video content, which was purchased by Apple. Later, he co-founded Jetpac, which created a product that analyzed millions of photos and generated in-depth guides for more than 5,000 cities. Pete joined Google in 2014 when Google acquired Jetpac. At Google, he led the development of the TensorFlow Lite framework, including an experimental version of TensorFlow Lite for microcontrollers. In 2022, Pete co-founded Useful Sensors, where he is the CEO. Useful Sensors creates low-cost, easy to integrate modules that bring capabilities like gesture recognition and presence detection to everyday appliances. Pete is the author of three O’Reilly books. He earned his BS in Computer Science from the University of Manchester and is currently enrolled in a PhD program at Stanford University.
Yong Jae Lee is an Associate Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. His research interests are in computer vision and machine learning, with a focus on robust visual recognition systems that learn to understand the visual world with minimal human supervision. Before joining UW-Madison in 2021, he spent one year as an AI Visiting Faculty at Cruise and six years as an Assistant and then Associate Professor at UC Davis. He received his PhD from the University of Texas at Austin in 2012 and was a postdoc at Carnegie Mellon University (2012-2013) and UC Berkeley (2013-2014). Professor Lee is co-author of the widely cited paper “Visual Instruction Tuning,” which proposes LLaVA (large language and vision assistant), an end-to-end trained large multimodal model that connects a vision encoder and an LLM for general-purpose visual and language understanding. He is also co-author of “Segment Everything Everywhere All at Once,” which proposes a novel decoding mechanism enabling diverse prompting for all types of segmentation tasks. Professor Lee is a recipient of the ARO Young Investigator Program Award (2017), UC Davis Hellman Fellowship (2017), NSF CAREER Award (2018), AWS Machine Learning Research Award (2018 and 2019), Adobe Data Science Research Award (2019 and 2022), UC Davis College of Engineering Outstanding Junior Faculty Award (2019), Sony Focused Research Award (2020 and 2023) and UW-Madison SACM Student Choice Professor of the Year Award (2022). He and his collaborators received the Most Innovative Award at the COCO Object Detection Challenge, ICCV 2019 and the Best Paper Award at BMVC 2020.