Yann LeCun, Director of AI Research at Facebook and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University, presents the "Convolutional Networks: Unleashing the Potential of Machine Learning for Robust Perception Systems" keynote at the May 2014 Embedded Vision Summit.
Convolutional Networks (ConvNets) have become the dominant method for a wide array of computer perception tasks including object detection, object recognition, face recognition, image segmentation, visual navigation, handwriting recognition, as well as acoustic modeling for speech recognition and audio processing. ConvNets have been widely deployed for such tasks over the last two years by companies like Facebook, Google, Microsoft, NEC, IBM, Baidu, Yahoo, sometimes with levels of accuracy that rival human performance.
ConvNets are composed of multiple layers of filter banks (convolutions) interspersed with point-wise non-linearities and spatial pooling and subsampling operations. ConvNets are a particular embodiment of the concept of "deep learning" in which all the layers in a multi-layer architecture are subject to training. This is unlike more traditional pattern recognition architectures that are composed of a (non-trainable) hand-crafted feature extractor followed by a trainable classifier. Deep learning allows us to train a system end to end, from raw inputs to ultimate outputs, without the need for a separate feature extractor or pre-processor.
This presentation demonstrates several practical applications of ConvNets. ConvNets are particularly easy to implement in hardware, particularly using dataflow architectures. A design called NeuFlow is described. Large-scale ConvNets for image labeling have been demonstrated to run on an FPGA implementation of the NeuFlow architecture. ConvNets bring the promise of real-time embedded systems capable of impressive image recognition tasks with applications to smart cameras, and mobile devices, automobiles, and robots.