LLMs, MoE and NLP Take Center Stage: Key Insights From Qualcomm’s AI Summit 2023 On the Future of AI

This blog post was originally published at Qualcomm’s website. It is reprinted here with the permission of Qualcomm.

Experts at Microsoft, Duke and Stanford weigh in on the advancements and challenges of AI

Qualcomm’s annual internal artificial intelligence (AI) Summit brought together industry experts and Qualcomm employees from over the world to San Diego in December 2023 to discuss the future of AI and its potential impact on various industries. We heard varying perspectives from experts at Microsoft, Duke and Stanford Universities, who also gave us insights into the work being done on large language models (LLMs), mixture-of-experts (MoE) models, natural language processing (NLP) and more. Read on to see the key takeaways from each talk.

How can we deploy LLMs successfully?

Key takeaways:

  1. Model size matters, and larger models can produce better results.
  2. The amount of compute needed to train top models is increasing, doubling every four to five months.
  3. Solutions are needed to better deal with the memory constraints of devices.

Marc Tremblay, vice president and distinguished engineer of silicon technology and strategy at Microsoft, discusses the challenges and advancements in building LLMs. He explains the general architecture of LLMs and the importance of factors like access to the web, responsible AI and compliance checks in deploying them as commercial entities. He also discusses the exponential growth in compute power required for training these models and the need for significant capital expenditures to support this growth. Tremblay says,

This is the amount of compute needed to train the model of the day. It’s kind of growing at Moore’s law’s rate of two times every two years roughly, then it hits a deep learning era. And now the compute that we need doubles every four to five months.

However, he remains optimistic about the potential for continued advancements in LLMs and the exciting possibilities they offer. He also delved into the specifics of generative AI inference, focusing on the challenges and optimizations involved in the token generation phase. Tremblay emphasizes the importance of identifying bottlenecks, exploiting parallelism, and leveraging hardware assets to improve performance and efficiency.

As the field of AI evolves, it is transforming numerous industrial sectors. We are observing a continuous increase in both the scale and complexity of cutting-edge models. This evolution poses substantial challenges in deploying these advanced models on edge devices, which often have limited computing and storage resources.

Reconciling large models and memory-constrained devices

Key takeaways:

  1. The gap between model size and device capabilities is a significant challenge in AI.
  2. Leveraging smaller models for training can improve efficiency.
  3. Collaborative AI can help overcome the limitations of on-device AI.

Yiran Chen, electrical and computer engineering professor at Duke University, gives a presentation introducing recent work called Sparsity-inspired Data-Aware (SiDA), which facilitates the deployment of large-scale Mixture of Expert (MoE) LLMs on memory-constrained devices. He also speaks to the collaborative work with Qualcomm Technologies on AutoML, which enables the deployment of advanced recommendation systems on Qualcomm Technologies’ SoCs (System on Chip). Chen says,

Collaborative AI is attractive, because if we cannot handle these things in the local, we need to handle them in a collaborative way. The fact that the computational resources and model are offered by different parties is another reason to consider this.

Chen presents several of his team’s federated learning projects that promote private, more accurate distributed intelligence by addressing the resource limitations in on-device applications of foundation models.

How can we further improve natural language processing pipelines?

Key takeaways:

  1. AI system design needs to move beyond manual prompt engineering and embrace data-driven optimization.
  2. The use of the DSPy library enables modular composable systems and data-driven optimization.
  3. The future of AI system design may lie in local models for reproducible research.

Christopher Potts, chair of linguistics and computer science professor at Stanford University, talks about how language models are enabling researchers to build Natural Language Processing (NLP) systems at higher levels of abstraction and with lower data requirements than ever before. However, these systems are being built around long, complex, hand-crafted prompt templates. This is akin to setting the weights of a classifier by hand rather than learning them from data.

Toward a more systematic approach, Potts and his team introduce DSPy, a programming model that abstracts language model pipelines as imperative computation graphs where language models are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation and reasoning techniques. The DSPy compiler will optimize any DSPy pipeline to maximize a given metric. Even quite simple DSPy programs, once compiled, routinely outperform standard pipelines with hand-created prompts and allow the development of performant systems using small language models. Potts says,

I think the future lies in us being able to get a lot of juice out of models that can run on our own devices inexpensively.

DSPy is an active, fully open-source tool for data-driven optimization in NLP tasks. It can be accessed here.

Looking towards the future of AI and machine learning

We’re looking forward to further collaborating with these AI experts on successfully deploying machine learning on devices around the world. Whether it’s optimizing NLP tasks by making them less data-hungry, running generative AI in a more memory-efficient manner, or scaling LLMs while reducing compute, there are exciting challenges to tackle in AI in 2024.

Armina Stepan
Senior Marketing Comms Coordinator, Qualcomm Technologies Netherlands B.V.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top