Is Deep Learning the Solution to All Computer Vision Problems?

MalikKeynote

This blog post was originally published in the late September 2017 edition of BDTI's InsideDSP newsletter. It is reprinted here with the permission of BDTI.

At the Embedded Vision Summit in May, I had the privilege of hearing a brilliant keynote presentation from Professor Jitendra Malik of UC Berkeley. Malik, whose research and teaching have helped shape the field of computer vision for 30 years, explained that he had been skeptical about the value of deep neural networks for computer vision, but ultimately changed his mind in the face of a growing body of impressive results.

There’s no question that deep neural networks (DNNs) have transformed the field of computer vision. DNNs are delivering superior results on recognizing objects, localizing objects within a frame, and determining which pixels belong to which object. Even problems like optical flow and stereo correspondence, which had been solved quite well with conventional techniques, are now finding better solutions using deep learning techniques. And the success of deep learning goes well beyond computer vision, to tasks like speech recognition.

As a result of these impressive successes, deep learning has attracted huge attention and investment, both among researchers and in industry. This focus and investment is accelerating progress both in deep learning algorithms and in techniques to implement these algorithms efficiently, enabling them to be integrated into a growing range of systems, including those with significant cost and power constraints.

This naturally raises the question: If you are incorporating computer vision functionality into your system or application, should you consider anything other than deep learning? In BDTI’s consulting practice, we’re increasingly hearing from clients who want to solve a computer vision problems using deep learning. But we’ve found that in in some cases, other types of algorithms are preferable. Why?

First, the visual world is infinitely varied, and there are an infinite number of ways in which system designers can use visual data. A few of these use cases, like object recognition and localization, are well addressed by published deep learning techniques. So, if your application requires an algorithm to recognize furniture, for example, you’re in luck: You can select a deep neural network algorithm from the published literature and retrain it with your own data set.

But let’s talk about that data set for a moment. Training data is critical to effective deep learning algorithms. Training a DNN typically requires many thousands of labeled training images (i.e., images labeled with the desired output), and many thousands more labeled images for evaluating candidate trained algorithms. And, of course, the nature of this data is important: the training and validation data must represent the diversity of cases that the algorithm is expected to handle. If obtaining enough diverse training data is difficult or impossible, you may be better off with conventional techniques.

Another reason to consider techniques other than DNNs is if you need to perform a computer vision task that hasn’t yet been addressed by a DNN algorithm in the published literature. In this scenario, you could try to use an existing DNN algorithm that was created for another purpose. Or you could try to create a new DNN algorithm tailored to your requirements. Either way, you’re in the realm of research. This can be daunting, because few people and organizations have experience developing novel deep neural network algorithms. And, it’s difficult to know whether you’ll succeed within the available time, effort and computing resources.

When we delve into our customers’ requirements, we often find that what starts out looking like a single visual perception problem can be broken down into several sub-tasks. Often, some of these sub-tasks are a natural fit for DNNs, while others are not. For these projects, a solution that combines DNNs and conventional techniques is often a better approach, rather than trying to force the entire problem into a DNN solution.

It’s also important to remember that machine learning techniques are many and varied. Long before deep neural networks become popular, other machine learning techniques (such as support vector machines) were being used to good effect on many vision problems, and they remain useful today.

Given the huge investments being made in DNN research and technology, it’s clear that the range of problems for which DNNs are the preferred solution will continue to expand rapidly. Nevertheless, for the foreseeable future, many applications will be best served by conventional techniques (including other forms of machine learning), or by a combination of deep learning and conventional algorithms.

Jeff Bier
Co-Founder and President, BDTI
Founder, Embedded Vision Alliance

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top