Tenyks

Video Understanding: Qwen2-VL, An Expert Vision-language Model

This article was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. Qwen2-VL, an advanced vision language model built on Qwen2 [1], sets new benchmarks in image comprehension across varied resolutions and ratios, while also tackling extended video content. ‍Though Qwen2-V excels at many fronts, this article explores the model’s […]

Video Understanding: Qwen2-VL, An Expert Vision-language Model Read More +

Scalable Video Search: Cascading Foundation Models

This article was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. Video has become the lingua franca of the digital age, but its ubiquity presents a unique challenge: how do we efficiently extract meaningful information from this ocean of visual data? ‍In Part 1 of this series, we navigate

Scalable Video Search: Cascading Foundation Models Read More +

Zero-Shot AI: The End of Fine-tuning as We Know It?

This article was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. Models like SAM 2, LLaVA or ChatGPT can do tasks without special training. This has people wondering if the old way (i.e., fine-tuning) of training AI is becoming outdated. In this article, we compare two models: YOLOv8 (fine-tuning)

Zero-Shot AI: The End of Fine-tuning as We Know It? Read More +

SAM 2 + GPT-4o: Cascading Foundation Models via Visual Prompting (Part 2)

This article was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. In Part 2 of our Segment Anything Model 2 (SAM 2) Series, we show how foundation models (e.g., GPT-4o, Claude Sonnet 3.5 and YOLO-World) can be used to generate visual inputs (e.g., bounding boxes) for SAM 2. Learn

SAM 2 + GPT-4o: Cascading Foundation Models via Visual Prompting (Part 2) Read More +

SAM 2 + GPT-4o: Cascading Foundation Models via Visual Prompting (Part 1)

This article was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. In Part 1 of this article we introduce Segment Anything Model 2 (SAM 2). Then, we walk you through how you can set it up and run inference on your own video clips. Learn more about visual prompting

SAM 2 + GPT-4o: Cascading Foundation Models via Visual Prompting (Part 1) Read More +

RAG for Vision: Building Multimodal Computer Vision Systems

This blog post was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. This article explores the exciting world of Visual RAG, exploring its significance and how it’s revolutionizing traditional computer vision pipelines. From understanding the basics of RAG to its specific applications in visual tasks and surveillance, we’ll examine

RAG for Vision: Building Multimodal Computer Vision Systems Read More +

Multimodal Large Language Models: Transforming Computer Vision

This blog post was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. This article introduces multimodal large language models (MLLMs) [1], their applications using challenging prompts, and the top models reshaping computer vision as we speak. What is a multimodal large language model (MLLM)? In layman terms, a multimodal

Multimodal Large Language Models: Transforming Computer Vision Read More +

DALL-E vs Gemini vs Stability: GenAI Evaluations

This article was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. We performed a side-by-side comparison of three models from leading providers in Generative AI for Vision. This is what we found: Despite the subjectivity involved in Human Evaluation, this is the best approach to evaluate state-of-the-art GenAI Vision

DALL-E vs Gemini vs Stability: GenAI Evaluations Read More +

Improving Vision Model Performance Using Roboflow and Tenyks

This blog post was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. When improving an object detection model, many engineers focus solely on tweaking the model architecture and hyperparameters. However, the root cause of mediocre performance often lies in the data itself. ‍In this collaborative post between Roboflow and

Improving Vision Model Performance Using Roboflow and Tenyks Read More +

NVIDIA TAO Toolkit: How to Build a Data-centric Pipeline to Improve Model Performance  (Part 3 of 3)

This blog post was originally published at Tenyks’ website. It is reprinted here with the permission of Tenyks. During this series, we will use Tenyks to build a data-centric pipeline to debug and fix a model trained with the NVIDIA TAO Toolkit. ‍Part 1. We demystify the NVIDIA ecosystem and define a data-centric pipeline based

NVIDIA TAO Toolkit: How to Build a Data-centric Pipeline to Improve Model Performance  (Part 3 of 3) Read More +

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top