This blog post was originally published at NVIDIA’s website. It is reprinted here with the permission of NVIDIA.
Next Chat with RTX features showcased, TensorRT-LLM ecosystem grows, AI Workbench general availability, and NVIDIA NIM microservices launched.
Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, software, tools and accelerations for RTX PC users.
NVIDIA’s RTX AI platform includes tools and software development kits that help Windows developers create cutting-edge generative AI features to deliver the best performance on AI PCs and workstations.
At GTC — NVIDIA’s annual technology conference — a dream team of industry luminaries, developers and researchers have come together to learn from one another, fueling what’s next in AI and accelerated computing.
This special edition of AI Decoded from GTC spotlights the best AI tools currently available and looks at what’s ahead for the 100 million RTX PC and workstation users and developers.
Chat with RTX, the tech demo and developer reference project that quickly and easily allows users to connect a powerful LLM to their own data, showcased new capabilities and new models in the GTC exhibit hall.
The winners of the Gen AI on RTX PCs contest were announced Monday. OutlookLLM, Rocket League BotChat and CLARA were highlighted in one of the AI Decoded talks in the generative AI theater and each are accelerated by NVIDIA TensorRT-LLM. Two other AI Decoded talks included using generative AI in content creation and a deep dive on Chat with RTX.
Developer frameworks and interfaces with TensorRT-LLM integration continue to grow as Jan.ai, Langchain, LlamaIndex and Oobabooga will all soon be accelerated — helping to grow the already more than 500 AI applications for RTX PCs and workstations.
NVIDIA NIM microservices are coming to RTX PCs and workstations. They provide pre-built containers, with industry standard APIs, enabling developers to accelerate deployment on RTX PCs and workstations. NVIDIA AI Workbench, an easy-to-use developer toolkit to manage AI model customization and optimization workflows, is now generally available for RTX developers.
These ecosystem integrations and tools will accelerate development of new Windows apps and features. And today’s contest winners are an inspiring glimpse into what that content will look like.
Hear More, See More, Chat More
Chat with RTX, or ChatRTX for short, uses retrieval-augmented generation, NVIDIA TensorRT-LLM software and NVIDIA RTX acceleration to bring local generative AI capabilities to RTX-powered Windows systems. Users can quickly and easily connect local files as a dataset to an open large language model like Mistral or Llama 2, enabling queries for quick, contextually relevant answers.
Moving beyond text, ChatRTX will soon add support for voice, images and new models.
Users will be able to talk to ChatRTX with Whisper — an automatic speech recognition system that uses AI to process spoken language. When the feature becomes available, ChatRTX will be able to “understand” spoken language, and provide text responses.
A future update will also add support for photos. By integrating OpenAI’s CLIP — Contrastive Language-Image Pre-training — users will be able to search by words, terms or phrases to find photos in their private library.
In addition to Google’s Gemma, ChatGLM will get support in a future update.
Developers can start with the latest version of the developer reference project on GitHub.
Generative AI for the Win
The NVIDIA Generative AI on NVIDIA RTX developer contest prompted developers to build a Windows app or plug-in.
Submissions were judged on three criteria, including a short demo video posted to social media, relative impact and ease of use of the project, and how effectively NVIDIA’s technology stack was used in the project. Each of the three winners received a pass to GTC, including a spot in the NVIDIA Deep Learning Institute GenAI/LLM courses, and a GeForce RTX 4090 GPU to power future development work.
OutlookLLM gives Outlook users generative AI features — such as email composition — securely and privately in their email client on RTX PCs and workstations. It uses a local LLM served via TensorRT-LLM.
Rocket League BotChat, for the popular Rocket League game, is a plug-in that allows bots to send contextual in-game chat messages based on a log of game events, such as scoring a goal or making a save. Designed to be used only in offline games against bot players, the plug-in is configurable in many ways via its settings menu.
“I found that playing against bots that react to game events with in-game messages in near real time adds a new level of entertainment to the game, and I’m excited to share my approach to incorporating AI into gaming as a participant in this developer contest. The target audience for my project is anyone who plays Rocket League with RTX hardware.”
— Brian Caffey, Rocket League BotChat developer
CLARA (short for Command Line Assistant with RTX Acceleration) is designed to enhance the command line interface of PowerShell by translating plain English instructions into actionable commands. The extension runs locally, quickly and keeps users in their PowerShell context. Once it’s enabled, users type their English instructions and press the tab button to invoke CLARA. Installation is straightforward, and there are options for both script-based and manual setup.
From the Generative AI Theater
GTC attendees can attend three AI Decoded talks on Wednesday, March 20 at the generative AI theater. These 15-minute sessions will guide the audience through ChatRTX and how developers can productize their own personalized chatbot; how each of the three contest winners’ showed some of the possibilities for generative AI apps on RTX systems; and a celebration of artists, the tools and methods they use powered by NVIDIA technology.
In the creator session, Lee Fraser, senior developer relations manager for generative AI media and entertainment at NVIDIA, will explore why generative AI has become so popular. He’ll show off new workflows and how creators can rapidly explore ideas. Artists to be featured include Steve Talkowski, Sophia Crespo, Lim Wenhui, Erik Paynter, Vanessa Rosa and Refik Anadol.
Anadol also has an installation at the show that combines data visualization and imagery based on that data.
Ecosystem of Acceleration
Top creative app developers, like Blackmagic Design and Topaz Labs have integrated RTX AI acceleration in their software. TensorRT doubles the speed of AI effects like rotoscoping, denoising, super-resolution and video stabilization in the DaVinci Resolve and Topaz apps.
“Blackmagic Design and NVIDIA’s ongoing collaborations to run AI models on RTX AI PCs will produce a new wave of groundbreaking features that give users the power to create captivating and immersive content, faster.”
— Rohit Gupta, director of software development at Blackmagic Design
TensorRT-LLM is being integrated with popular developer frameworks and ecosystems such as LangChain, LlamaIndex, Oobabooga and Jan.AI. Developers and enthusiasts can easily access the performance benefits of TensorRT-LLM through top LLM frameworks to build and deploy generative AI apps to both local and cloud GPUs.
Enthusiasts can also try out their favorite LLMs — accelerated with TensorRT-LLM on RTX systems — through the Oobabooga and Jan.AI chat interfaces.
AI That’s NIMble, AI That’s Quick
Developers and tinkerers can tap into NIM microservices. These pre-built AI “containers,” with industry-standard APIs, provide an optimized solution that helps to reduce deployment times from weeks to minutes. They can be used with more than two dozen popular models from NVIDIA, Getty Images, Google, Meta, Microsoft, Shutterstock and more.
NVIDIA AI Workbench is now generally available, helping developers quickly create, test and customize pretrained generative AI models and LLMs on RTX GPUs. It offers streamlined access to popular repositories like Hugging Face, GitHub and NVIDIA NGC, along with a simplified user interface that enables developers to easily reproduce, collaborate on and migrate projects.
Projects can be easily scaled up when additional performance is needed — whether to the data center, a public cloud or NVIDIA DGX Cloud — and then brought back to local RTX systems on a PC or workstation for inference and light customization. AI Workbench is a free download and provides example projects to help developers get started quickly.
These tools, and many others announced and shown at GTC, are helping developers drive innovative AI solutions.
From the Blackwell platform’s arrival, to a digital twin for Earth’s climate, it’s been a GTC to remember. For RTX PC and workstation users and developers, it was also a glimpse into what’s next for generative AI.
See notice regarding software product information.
Jesse Clayton
Product Manager, Mobile Embedded, NVIDIA