Dinesh Jayaraman, Assistant Professor at the University of Pennsylvania, presents the “Vision-language Representations for Robotics” tutorial at the May 2023 Embedded Vision Summit.
In what format can an AI system best present what it “sees” in a visual scene to help robots accomplish tasks? This question has been a long-standing challenge for computer scientists and robotics engineers. In this presentation, Jayaraman provides insights into cutting-edge techniques being used to help robots better understand their surroundings, learn new skills with minimal guidance and become more capable of performing complex tasks.
Jayaraman discusses recent advances in unsupervised representation learning and explains how these approaches can be used to build visual representations that are appropriate for a controller that decides how the robot should act. In particular, he presents insights from his research group’s recent work on how to represent the constituent objects and entities in a visual scene, and how to combine vision and language in a way that permits effectively translating language-based task descriptions into images depicting the robot’s goals.
See here for a PDF of the slides.