Comparing State of the Art Region of Interest Trackers

1*Wz0Bv42ATKdGHzQ2ZUoXoQ

This blog post was originally published by Teleidoscope. It is reprinted here with the permission of Teleidoscope.

In our previous post we discuss the various types of computer vision based tracking. At Teleidoscope we’ve dedicated significant time and effort to building a fast and robust Region of Interest (ROI) tracker which we license as is, and build custom solutions on top of based on client needs.

This post examines and compares state of the art ROI trackers, including our own.

Recap: Region of Interest Tracking

ROI trackers are used for tracking arbitrary moving objects or regions of an image. This can be done with or without prior training data.

One example of this might be tracking something like a surfer with a consumer drone or using a gimbal to help a camera follow a person.


TIO — Tracking a surfer

This method works by using information from previous frames to estimate where the object is likely to be in the current frame. This allows the algorithm to only search a small area instead of searching the full image like a traditional detector based approach.

Comparisons

There are many ROI tracking solutions out there, but most don’t work well in every day situations. Below we have benchmarked and compared our tracking against 3 others that are considered state of the art. We will start with the video comparisons, then go into the details of each and the pros and cons.


Comparing our tracker (TIO) to others considered state of the art


Comparing our tracker (TIO) to others considered state of the art

CSRT

This tracker works by training a correlation filter with compressed features (HoG and Colornames). The filter is then used to search the area around the the last known position of the object in successive frames.

Pros: – Slower but more accurate than KCF – Robust to unpredictable motion – Trained on a single image patch – Can recover from failures when the object hasn’t moved much – Can tolerate intermittent frame drops – Reports unrecoverable failures – Adapts to scale, deformation and rotation – Manually adjustable parameters

Cons: – Does not recover well from failures due to full occlusion – Latches onto surrounding regions when partially occluded resulting in drift – Does not recover when objects are changed out of view – Does not recover from multiple consecutive failures – Does not incorporate motion into estimation

KCF

This tracker works by training a filter with patches containing the object as well as nearby patches that do not. This allows the tracker to search the area around the previous position and exploit the fact that nearby patches are likely to contain the object.

Pros: – 1.5–2x faster than CSRT and ~10x faster than TLD – Adapts to scale and rotation – Trained on a single image patch – Aggressive failure reporting – Manually adjustable parameters – Supports custom feature extractor

Cons: – Does not recover from failures well – Does not recover when objects are changed out of view – Does not recover from multiple consecutive failures – Does not incorporate motion into estimation

TLD

This tracker works by training a classifier that is used to re-detect the object and correct tracking errors.

Pros: – Recovers from from full occlusion – Trained on a single image patch – Adapts to scale and deformation – Searches the entire image on failures making it good for ‘general location’ reporting

Cons: – Very frequent false positives – Very unstable scale estimation – Does not report failures well – Very slow comparatively (60–100ms)

TIO (ours)

This tracker works by learning the texture, color, shape and surroundings of an object as they change over time.

Pros: – Recovers from full occlusion – Long term tracking – Remains stable under partial occlusion – Adapts to scale, deformation and rotation – Supports manual or automatic ideal parameter tuning – Supports manual feature selection – Is able to recover from changes occurring to an objects out of view – Trained on a single image patch – Reports when it is searching for a lost object after a soft failure – Supports multiple object initialization and tracking – Auto-detects drift or learning errors and corrects if needed – Can run in realtime on mobile hardware – Can take advantage of depth sensors or IMUs to improve accuracy – Robust to severe frame drops and long term target invisibility – Optimized to support high FPS and high resolution cameras – Can take advantage of camera focus and exposure events if available – Can be initialized on moving objects

Cons: – Large objects that take up more than 40% of the image may result in a drop in performance


TIO provides the accuracy of CSRT, the speed of KCF and is able to recover from failures without constant false positives better than TLD. Know someone that needs a robust tracking solution? Please reach out to me at [email protected].

Written with the help of our lead computer vision engineer Eric Lundquist

Matt Rabinovitch
Founder and CEO, Teleidoscope

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top