TLD, aka Predator, Kinect-like Open Source Tracking Algorithms

zdenek-kalal-TLD The Microsoft Kinect peripheral for the Xbox 360 is more than just a fancy camera with a microphone attached—it’s a culmination of hardware and software that work together to drive tracking information out of the device and to connected devices. We’ve covered the anatomy of the algorithms that do the tracking before, but they’re still proprietary to Microsoft.

Insofar, access to these algorithms for open source has been via drivers that let developers talk to the Kinect; but the next-best-step would be to have those algorithms in the open source to begin with so that the world community could set the stage for addressing them and making them better.

Well, move over Kinect, Zdenek Kalal, a computer scientist at École Polytechnique in the Czech Republic, is working on his PhD to enable computers to “see” and has come up with an error-correcting tracking algorithm.

In his demonstration, he talks about TLD (Track-Learn-Detect), aka Predator, and how its object tracking algorithm works by predicting the movements of an object in the cameras field of view and taking a series of sequential snapshots of how the object changes as it moves. As light, angles, depth, and motion skew and morph the camera’s view of the object, the algorithm adds on to its relational map of how to recognize the object and follow it.

While this isn’t entirely how the Kinect works—which is an advanced kinesiology setup for recognizing human bodies and objects in motion—it has a lot of elements in common.

Zdenek even goes so far to demo a few extremely useful effects of the algorithm.

Subject-video stabilization: the ability for a wide-shot camera view to stabilize video based on the movement of a single object—i.e. pretty much make it look like the guy dodging and weaving through traffic is being followed exactly by the camera when really the cameraman only has them loosely framed. If nothing else, this sort of thing could make it easier for amateur filmmakers to produce interesting content; or for web cameras to better pick out and hold subjects in plain view during meetings.

Also, facial recognition became a real factor in this. As the algorithm is capable of forming a highly detailed picture of a moving object at various depths and angles, it can be used to find the same face again in a crowd. Zdenek’s demo involved him picking his face, letting the algorithm “learn” to recognize him when he was on screen and then it picked his face out of a yearbook page. Coupled with social-media it could lead to being able to scan videos for people and pick them out quickly (either that or it would make for a hilarious episode of CSI: or NCIS.) This sort of technology might be of interest to Yandex and as well when it comes to facial recognition. Not to mention how combining it with biometrics for Window’s login with Blink! might be fun (although unlocking a computer with a yearbook photo would be a bad precedent for security technology.)

It’s a wonderful little project and because it’s designed to work with off-the-shelf consumer grade webcams, this could be used to either supplant or augment Kinect’s current homebrew fueled fervor.

[Thanks for finding this video, ReadWriteHack!]