Is the Future of Machine Vision Already Here? Business Insights from the 2016 Embedded Vision Summit

AAEAAQAAAAAAAAjwAAAAJGJjZjc4ZDkwLTQwMmItNGQxYi1hODVmLWEzOGRjYTBkNjFhYw

This article was originally published by Embedded Vision Alliance consultant Dave Tokic. It is reprinted here with Tokic's permission.

My biggest “aha moment” out of many during the Embedded Vision Summit 2016 came right at the beginning. I was a front row participant as moderator of the 2-day Business Insights track and have been closely involved in this space for a number of years. But this Summit was different.

Here’s my take on the event, with links below for more info, and please comment to provide your thoughts to the discussion.

What was science fiction or research is now becoming real and practical. It will be transformative, creating new value chains and business opportunities. It will impact our lives, changing how we interact with each other, with information, and with machines for increased safety and efficiency. And it will be everywhere. So back to that first “aha”.

Jeff Dean, Google Senior Fellow and head of their Brain team that just put TensorFlow machine learning library into open source, gave the opening day keynote on deep learning, a powerful class of machine learning that allows machines to understand what they are seeing. These algorithms are trained by exposing them to huge amounts of data, e.g. 1000’s of tagged pictures of a car or people or animals, so they learn to recognize an unknown similar image very accurately, even if the object is somewhat obscured. They are now even able to interpret context. It’s not just a picture of a “baby”, but the algorithm comes back with “A baby is asleep next to a teddy bear”. Quite impressive.

The first “aha” was just how accurate these things are getting and it’s better than a live person. Jeff shared that a recent neural net approach to image classification had only 3.46% error vs. a person at 5.1% error… and it continues to get better. That means cars driving themselves better than humans, finding the right photos by searching “family vacation in Dubrovnik” rather than scrolling through all those folders, the door unlocking when your smart lock recognizes you, finding the injury or tumor more accurately in an MRI, and many other applications in automotive, robotics, drones, medical, security, home automation, finance, speech recognition, big data analytics, etc.

This was reinforced by Chris Rowen, CTO of the IP Group at Cadence, in his presentation about the road ahead for neural networks. While machine learning is at the peak of Gartner’s Hype Cycle, it is inevitable that it is proliferating into our personal and professional lives across the applications mentioned above. The current use in cloud computing will deepen and will quickly expand into real-time embedded applications, dominated by solving vision-based problems and evolving from just recognizing objects to driving action (avoid that pedestrian). Chris sees this as a key driver for much more powerful and energy efficient (100x reduction) processing in the cloud (exa-MAC) and embedded (peta-MAC) and automated network optimization. New value chains will emerge, with access to large data sets being a disruptor and spurring another level of discussion and concern over privacy.

Bruce Daley, Principal Analyst for Artificial Intelligence at Tractica, emphasized it succinctly, data is wealth (see his book by the same name). The data we’re capturing is growing exponentially, impossible to analyze manually, and continues to increase in value throughout the product life cycle. This is where deep learning steps in. He presented numerous use cases, such as image identification for targeted advertisement, image analysis for radiology and crop health, automated clothes sizing and fitting, and manufacturing quality control. The insightful takeaway was that it will be the companies with the best business models, not just the technology, that will win as we’re seeing in other spaces with companies like Uber, Airbnb, and Facebook. Bruce sees that machine learning will generate $10.4B alone in software by 2024, with another $41B in hardware and $52B in services by that time.

Embedded vision enabled with machine learning will be ubiquitous like wireless today, stated Raj Talluri, SVP IoT & Mobile Computing at Qualcomm. Advancements in low power processing, machine learning, sensors, and the cloud have made machine vision real and are empowering a broad set of apps in automotive, mobile imaging, augmented / virtual reality (AR / VR), IP cameras, drones, and robotics and will drive tremendous opportunity. 8.7B smartphones with vision technology will be shipped between 2016-2020. Advanced Driver Assist Systems (ADAS) demand will drive nearly $20B and 19.2% CAGR between 2015-2020. 29M units and 31.3% revenue growth by 2021 is estimated for drones. Lastly, AR / VR market is expected disrupt how we interact with compute and devices, hitting $100B-$150B by 2020 (DiGi Capital, others).

Peter Shannon, managing director at Firelake Capital, brought his perspective on commercializing applications of computer vision as a long-time investor and former developer. A key challenge of computer vision is that it is based on inductive rather than deductive reasoning, e.g. the computer’s conclusion is probably right. This is the same as speech recognition. The consequence of an error may small, but that really depends on the application, e.g. if the car doesn’t recognize an obstruction and has an accident vs. Siri didn’t hear you just right. Minimizing the risks involves careful tuning of the software and hardware of the system, and retuning as changes and upgrades are done to either… 10% of the effort is on the original algorithm and 90% is to ensure robustness.

Tim Hartley of ARM dissected the problem of bringing mobile and embedded vision products to market further, highlighting the considerations regarding sensor & processor choice, power & thermal tradeoffs, and enabling vision software development teams. Embedded / mobile vision products have tight requirements of performance / watt but also wide range of image quality depending on the application. Limited cooling and often battery-only power may constrain the processor. OpenCV middleware libraries accelerate development but are more tuned to desktop application, so tuned versions that potentially leverage a framework based on OpenVX from Khronos could be the answer.

Carrying the theme further, computer vision development still requires hard to find PhDs and heavy algorithmic coding as opposed to gaming, where advanced tools allow a less specialized programming community to support game development far more broadly. Paul Kruszewski of WRNCH walked the audience through how the gaming industry evolved from using early tools and APIs (e.g. Khronos' OpenGL) through software frameworks and middleware (graphics, physics, etc.) to sophisticated engines and ecosystems (Unity, Unreal) and showed how computer vision is at the early stages of this democratization of development.

Probably the most visible (Most hyped? Most invested?) of the markets that machine vision and machine learning enables is automotive, with a huge impact on driver safety and efficiency through ADAS and self-driving cars.

Stefan Heck of autonomous driving and analytics provider Nauto shared that 95% of the 33,000 road fatalities costing 300B annually were caused by human error and the aggregate cost of driving a car is $3/mile since the typical car spends 96% of its time parked and only 1% of the energy in the gas tank is actually used to move the person vs. inefficiencies of engine losses, idling, etc. Marco Jacobs of vision processor provider videantis highlighted that there are 1.2M driving deaths per year worldwide, with about 5 years of time wasted in the car, and a rough cost of $0.10/mile in just insurance alone.

Stefan is driving the development of vision systems and analytics that capture real-time driver behavior (distraction, speed, driving style) in context of surrounding traffic and road risks, as well as predictive mapping of the most efficient routes to destinations based on safety and congestion trending (e.g. that corner has historically had lots of accidents). Videantis is addressing the processing challenge from the growing number of cameras on the car, acknowledging the transition to autonomy will be gradual over the next 5-10 years but dependent on increasing performance, lower power, and improving algorithms. Of course, while governments are now mandating rear cameras and some ADAS functions, there is still a significant road ahead before concerns around reliability and liability for self-driving cars are addressed.

Tom Wilson of NXP provided additional insight into the fusion of various types of sensors being incorporated into cars, not only cameras but also radar, LiDAR, and ultrasound. He states that no single sensor can solve the requirements for ADAS and autonomous cars and the key challenge will be how to partition fusion processing within the constraints of bandwidth and processing capability.

Expanding outside of automotive, Gerrit Fischer of Basler provided a deeper look choosing the right camera subsystems, tearing down the camera and reviewing details and tradeoffs of architecture, sensors, and interfaces. He concluded by emphasizing that volume and cost are the key drivers between an off the shelf industrial imaging system vs. a more custom embedded solution.

Andreas Gal, founder & CEO of Silk Labs (and “father” of Firefox OS), focused his talk on Internet of Things (IoT) applications and the tradeoffs of whether visual intelligence processing should reside in the cloud or the edge. Today’s smartphone supply chain makes any connected device we can dream of cheap to build and video sensors are now everywhere. But training deep neural networks is still in the cloud as is storage of personal images and video. This has big implications on privacy and examples he gave were Findface (see article), a Russian service that can get personal info through social media from a single anonymous photo, and Shodan, a search engine for finding unsecured connected cameras, such as baby cams, etc. (see article). He highlighted how a machine learning edge device powered by a mobile-class processor, e.g. a camera-enabled smart door lock or security cam, can train simpler algorithms locally, still be accurate enough, and minimize what personal information is sent into the cloud.

Virtual reality (and augmented reality) has a deep dependence on computer vision and is another highly anticipated and hyped market space, with over 260 startups and 200,000 developers today, not to mention huge investments such as Facebook’s $2B acquisition of Oculus. Allen Rush, AMD Fellow, shared his perspective on the path VR is taking toward a full immersive presence, the necessary VR content, and key applications. Processing performance, algorithms, and displays still need to improve to reduce latency (move your head or hand and the virtual world moves noticeably later), field of view, position accuracy, object tracking, and display resolution. Reinforcing Paul of WRNCH, developer enablers and communities need to be established and matured to generate the VR content. Of course, business models that make money need to developed and proven for the industry to grow.

The last two talks I’ll cover here takes us out of this world and into the future. Larry Matthies, head of the Computer Vision Group in NASA's Jet Propulsion Laboratory, provided an extensive keynote discussion of the work JPL is doing on land, sea, air, and space. He discussed how machine vision is being used by the existing and future Mars programs to safely land and navigate rovers and future rotorcraft, control walking robots for the DARPA challenge, and “see” boats and other obstacles for autonomous marine and underwater craft. As with prior NASA innovation, this research and practical application of vision helps accelerate adoption into our everyday lives.

Jeff Bier, founder of the Embedded Vision Alliance and president of BDTI, gave a closing talk on where embedded vision is heading. He outlined the progression from computer vision research to practical use of vision in our everyday lives covered by many of the earlier presenters. Some takeaways of what’s next is that it will be commonplace, just like dictating a text or asking Siri what restaurant is nearby. Tuning algorithms for each task by PhDs will transition to machine learning and deep neural networks, scaling and accelerating development, and it will leverage a mix of cloud and edge computation depending on the application. The trend he sees will be toward vision-specific and heterogeneous processors with tools and engines evolving to abstract the programming complexity from implementation to integration and broadly expanding the developer community.

What’s made this Summit different from the ones only a couple years ago is how far the state of the art has come… and how much excitement around the potential as seen by the level of investment and creativity, how quickly things are going from concept to real products, and the first steps to understanding how to make money with mature discussions around business models, enablement, and ecosystems.

The future of machines that not only see, but perceive what they see and act appropriately to make our lives better is coming. More of it is here now than you may think, but in the next 5-10 years, it will be so ingrained in our fabric we won’t think twice about the AR panel that greets us by name while we gesture to place our order after stepping out of the self-driving transportation-as-a-service car in front of our robot-enabled neighborhood coffee shop.

About the Author

I strongly believe that a huge wave of innovation and growth will come from the convergence of machine learning and machine vision technology deployed to the enterprise and consumers through connected IoT devices and strong ecosystems. I help companies succeed by creating winning products, go-to-market programs, and the targeted ecosystems necessary to accelerate revenue growth. Let me know how I can help you.

Here you’ll find a wealth of practical technical insights and expert advice to help you bring AI and visual intelligence into your products without flying blind.

Contact

Address

Berkeley Design Technology, Inc.
PO Box #4446
Walnut Creek, CA 94596

Phone
Phone: +1 (925) 954-1411
Scroll to Top