This blog post was originally published at Ambarella’s website. It is reprinted here with the permission of Ambarella.
Ambarella recently partnered with Omdia to commission an independent white paper on the future of Generative AI at the edge, by their Principal Analyst for Advanced AI Computing, Alexander Harrowell. It combines his insights and Omdia’s data on the significant increase in performance and capabilities that will be brought about by the deployment of Large Language Models and GenAI at the edge—and the market opportunities that presents.
We followed that up with a joint webinar, where Alex and I provided added perspective, and answered attendee questions, on the trends and technologies for achieving scalable GenAI in edge-endpoint devices and on-premise hardware. The following are my key takeaways from both the paper and the webinar…
Smaller Models and More Efficient Compute
In the rapidly evolving landscape of AI and processing, we are seeing a significant increase in the performance and capabilities of devices through the deployment of Large Language Models (LLMs) and generative AI at the edge. With the proliferation of open-source LLMs, innovation has shifted from creating huge models to developing and fine-tuning smaller models that are capable of being deployed at the edge and tailored for specific use cases. For example, the contrastive language–image pre-training (CLIP) vision language model (VLM) has less than 1 billion parameters.
These smaller models put the powerful and cutting-edge technologies of generative AI in reach for edge inference. However, they still demand a substantial amount of compute and memory, posing new challenges and constraints that will need to be addressed by robust hardware solutions.
Ambarella’s CVflow® 3.0 AI SoC architecture is natively well suited for multimodal processing of video and GenAI models, simultaneously at very low power. On the high end, our N1 SoC series runs Llama2-13B in single-streaming mode at under 50W. And, that CLIP model I mentioned can run handily on our CV7 SoC family while processing four video feeds—with that whole system consuming less than 5W.
Democratization of Model Development
AI model development is no longer limited to a select few with access to massive computing resources. The rapid growth of open-source models has transformed not just AI models, but AI model development and deployment, making it easy to start with a pretrained LLM and fine tune it to create a version for your specific application.
Accompanying this new wave of AI accessibility is a shift in focus from developing large-scale models (>50 billion parameters), typically run in hyperscalers for high-instance AI inferencing, to what Alex calls “the missing middle” (3-50B parameter models) that can be deployed outside of the data center – onto edge servers and end-point devices. Anything with 8GB of available RAM is now a potential LLM platform, especially if it has a unified architecture.
A Growing Demand for Compute at the Edge
Although the smaller, lightweight models can be developed and run on cheaper hardware, lowering costs from a data center or cloud standpoint, they are driving demand for more compute and memory – presenting new challenges in terms of price, power, size, and scalability.
The transition from basic computer vision models to small LLMs is enabling state-of-the-art capabilities, such as multimodal AI and prompt engineering, in new edge and embedded products. However, to fully realize the benefits of deploying generative AI at the edge, businesses will need to choose the right hardware solution for their use case.
Visual Analytics is Driving Growth in Generative AI
In the realm of generative AI, visual analytics stands out as the biggest growth driver in terms of market opportunity, with a widening range of applications that essentially use a camera as a sensor. Within this category, certain industry verticals such as industrial automation, robotics, smart cities, and medical devices stand to benefit greatly from innovations in generative AI. The ability to utilize on-premise cameras or devices, maintaining data privacy while utilizing the advantages of low-latency processing, enables the use of new technologies such as contextual search and natural language queries to provide deeper insights for end users.
The Future of Generative AI at the Edge
The democratization of AI model development is revolutionizing the AI landscape, driving innovation and cost reductions across the board. Smaller, tailored models built for specific use cases are making generative AI more accessible for a wide range of new edge devices and embedded systems.
As visual analytics continues to drive growth in generative AI, businesses will require the right hardware solution to leverage these new technologies for enhanced capabilities and insights. The next evolution of AI requires not just performance, but also finding the right balance between efficiency and scalability.
To learn more about the current and future trends for generative AI at the edge, download Omdia’s white paper at the following link. There, you can also view our on-demand joint webinar with Omdia. No registration form required: www.ambarella.com/cooper
Amit Badlani
Director of Generative AI and Robotics, Ambarella