This article was originally published at NVIDIA's blog. It is reprinted here with the permission of NVIDIA.
Researchers have been able to advance computerized object recognition to once unfathomable levels, thanks to GPUs.
Building on the work of neural network pioneers Kunihiko Fukushima and Yann LeCun – and more recent efforts by teams at the University of Toronto – New York University researchers have used GPUs to dramatically up the accuracy of earlier object recognition efforts.
Rob Fergus, an NYU associate professor of computer science, told a packed room at NVIDIA’s GPU Technology Conference that GPUs helped his team build models that enable computers to understand what they’re looking 50 times faster than with CPUs alone.
That, in turn, reduced the rate of error when identifying objects in complex images from the 26 percent achieved with earlier approaches to 16 percent, a result achieved at the 2012 ImageNet Large Scale Visual Recognition Challenge.
“This was a big surprise, and something that does not happen regularly in our field,” Fergus told more than 200 people at the GTC attendees.
And yet, in 2013, one of Fergus’ prized students, Matt Zeiler, has pushed that figure down further, to less than 12%.
Not surprisingly, when a GTC attendee asked for source code, Fergus said that Zeiler was planning to launch a startup, and was thus protective of his code.
He’ll have company, as the work of Fergus’ team has received funding from Microsoft, Google and Facebook. And, in fact, Fergus has joined neural pioneer LeCun on Facebook’s new artificial intelligence research team, on a part-time basis.
“These models are starting to work and really be useful, which is why companies like Facebook and Google are starting to use them,” Fergus said during his GTC talk.
The idea of so-called “convolutional models” for object recognition is to train networks to discriminate between different types of objects through a series of layers, each of which builds upon the convolutional model of the previous layer. (The concept of “convolution” is similar to that of cross-correlation.)
Now if only researchers fully understood the models they’re building.
“There’s a lack of good theoretical grounding as to why these models work well, and how we can make them work better,” said Fergus.
Along those lines, Fergus and his research team have experimented with removing various layers from the process, and have done so with little loss in accuracy. However, Fergus said removing too many layers results in “catastrophic loss of performance.”
One important thing to note is that the models Fergus presented all relied on supervised learning, in which the training data is labeled so that the network can be taught. The next frontier will be getting networks to recognize images that aren’t labeled.
“How do you get unsupervised learning to work?” asked Fergus. “That’s the big question everyone is trying to solve.”
If you want to see how the models built by Fergus’ team work, upload some images to the online demo he’s posted at horatio.cs.nyu.edu and see what kind of results you can get. Hint: Photos of easily recognizable objects work best.