Optimizing Multimodal AI Inference

This blog post was originally published at Intel’s website. It is reprinted here with the permission of Intel.

Multimodal models are becoming essential for AI, enabling the integration of diverse data types into a single model for richer insights. During the second Intel® Liftoff Days 2024 hackathon, Rahul Nair’s workshop on Inference of Multimodal Models offered a deep dive into optimizing these models for inference on Intel CPUs and GPUs.

The workshop covered practical techniques to enhance model performance using Intel’s tools and extensions for PyTorch. A key focus was on the ipex.optimize function from Intel® Extension for PyTorch*, demonstrating how it can significantly optimize model inference on Intel hardware.

In this recap, we’ll break down the key points from Rahul’s workshop, including optimization strategies for inference on Intel hardware and best practices for utilizing Intel’s software tools to improve multimodal AI performance.

Here’s a breakdown of the third workshop

The workshop’s first part, led by Rahul Nair, focused on the fundamentals of fine-tuning multimodal models. Rahul began by emphasizing the significance of fine-tuning in adapting large models to specialized tasks. “Fine-tuning helps customize the model to give concise answers or perform specific tasks, such as reading text from images,” he noted.

To illustrate this, Rahul used the Moondream model, a 1.9 billion parameter architecture combining a Vision Transformer (VIT) for image embedding and a simple MLP for converting these embeddings into a format understandable by the language model component. This combination allowed for effective multimodal data handling, showcasing the versatility and power of multimodal models.

Next, Rahul delved into the model loading and optimization process, guiding participants on how to prepare the model for training. One of the core techniques discussed was using Intel’s extension for PyTorch to boost model performance. “The ipex.optimize function helps improve model performance by adjusting data formats and optimizing operations,” Rahul added, highlighting how Intel’s software tools can be leveraged to maximize hardware capabilities.

The second part of the workshop featured presentations from startups like PeopleSense and Kai, who shared their experiences and challenges with using Intel’s AI tools.

Dr. Harsh Verma from PeopleSense highlighted difficulties with setting up the development environment on Intel’s DevCloud, while Gilberto Pardo from Kai discussed issues related to managing large datasets for model training. These discussions allowed participants to explore practical solutions, such as using synthetic data generation and optimizing the DevCloud setup, demonstrating the value of community support and knowledge sharing in the Intel® Liftoff ecosystem.

The session wrapped up with a general discussion on practical applications and collaborative problem-solving within the community. Participants emphasized the importance of leveraging Intel’s tools and the support available through the Liftoff program to accelerate AI development. The workshop reinforced the value of fine-tuning and optimization in developing effective AI models, providing actionable insights for those looking to push the boundaries of what AI can achieve.

Eugenie Wirz, emphasized the value of collaboration and continuous learning within the AI community. “We are here to help each other and solve technical challenges together,” Eugenie stated, highlighting the collaborative spirit of the Intel® Liftoff community.

Jade Worrall
Developer Relations and Engagement Specialist, Intel

If you're building AI or vision-enabled products, you've come to the right place.

Here’s a breakdown of the third workshop

Pages

Topics

Contact

Address

Phone