This blog post was originally published in the mid-November 2016 edition of BDTI's InsideDSP newsletter. It is reprinted here with the permission of BDTI.
For humans, it goes without saying that vision is extremely valuable. When you stop to think about it, it’s remarkable what a diverse set of capabilities is enabled by human vision – from reading facial expressions, to navigating complex three-dimensional spaces (whether by foot, bicycle, car, or otherwise), to performing intricate tasks like threading a needle.
One of the reasons why I’m so excited about the potential of computer vision is that I believe that it will bring a similar range of diverse and valuable capabilities to many types of devices and systems. In the past, computer vision required too much computation to be deployed widely. But today, sufficient processing power is available at cost and power consumption levels suitable for high-volume products. As a result, computer vision is proliferating into thousands of products.
The vast range of diverse capabilities enabled by vision (from user interfaces to video summarization to navigation, for example), coupled with the wide range of potential applications, can be daunting. How do we figure out which of these capabilities and applications are really worthwhile, and which are mere novelties?
I think the analogy with biological vision can help. In a recent lecture, U.C. Berkeley professor Jitendra Malik pointed out that in biological evolution, “perception arises with locomotion.” In other words, organisms that spend their lives in one spot have little use for vision. But when an organism can move, vision becomes very valuable – enabling the organism to seek food and mates, for example, and to avoid becoming food for other creatures.
In the technological world, to paraphrase Professor Malik, when you put vision and locomotion together, you get things like self-driving cars. And vacuum cleaning robots, obstacle-avoiding drones, driverless forklifts, etc. It’s possible to build autonomous, mobile devices like these without vision, but it rarely makes sense to do so. In other words, just as in the biological world, vision becomes essential when we create devices that move about.
What other clues can we glean from biology to inform our thinking about the most valuable uses of computer vision? In his lecture, Professor Malik pointed out that in biological evolution, “the development of the hand led to the development of the brain.” While feet carry us from place to place, hands are arguably the main means by which humans act on the physical world. Human hands are extraordinarily versatile – and vision is essential to realizing their potential.
Similarly, machines that act on the physical world require visual perception to realize their full potential. For years, this has been evident through research projects showing that vision-enabled robots can do amazing things, from the robot that always wins at Rock, Paper, Scissors to robots that learn how to grasp new objects through experimentation. What’s exciting now is that robots that use vision to act on the physical world are being deployed at scale, from tiny interactive toys to large agricultural machines. Of course, not all of these robots have what we would think of as “hands”; depending on the tasks they’re designed for, other types of manipulators may be appropriate.
In his lecture, Professor Malik quoted the Greek philosopher Anaxagoras, who said: “It is because of being armed with hands that man is the most intelligent animal.” Similarly, as machines gain the ability to interact with the physical world, they need intelligence – especially visual intelligence – to become truly capable.
I’ve long admired Professor Malik’s insights, and so I’m extremely pleased that he will be a keynote speaker at the 2017 Embedded Vision Summit, taking place May 1-3, 2017 in Santa Clara, California. My colleagues and I at the Embedded Vision Alliance are putting together a fascinating program of presentations and demonstrations, with emphasis on deep learning, 3D perception, and energy-efficient implementation – exactly the types of things needed to enable the next generation of intelligent machines. If you’re interested in implementing visual intelligence in real-world devices, mark your calendar and plan to be there. You can sign up for Summit updates on the Alliance website.
Jeff Bier
Co-Founder and President, BDTI
Founder, Embedded Vision Alliance