This blog post was originally published in the December 2015 edition of BDTI's InsideDSP newsletter. It is reprinted here with the permission of BDTI.
Since reading Malcolm Gladwell's Blink a decade ago, I've been intrigued by how the mind works – particularly how judgements and decisions are made. I've been inspired to take an armchair tour of research on this topic, and have encountered fascinating insights from the likes of David Eagleman and Daniel Kahneman.
Reading the work of these talented researchers and writers has led me to the inescapable conclusion that most of our judgements and decision-making take place in our subconscious minds.
I consider myself a hyper-rational engineering type, so the idea that my subconscious is calling the shots – based not on deliberation and calculation but rather on intuition – was initially uncomfortable. Lately, though, I've come to appreciate the value of intuition – the way it can alert me to a dangerous situation before I comprehend the nature of the danger, for example, or warn me that someone's being untruthful before I'm able to identify the actual lie.
And that has started me wondering: What if our devices, systems and applications could gain this type of intuitive insight? For example, what if a device could warn you that there's been a change in your elderly parent's posture or gait that might indicate an increased risk of falls? Or that your teenager's distracted driving indicates a higher risk of an accident?
Because our subconscious processes are by definition hidden from us, it might seem futile to try to create programs to emulate them. But I think that deep neural networks (the kind that have recently been beating humans at image classification tasks) offer an elegant solution.
Creating a deep neural network to distinguish between classes of objects or events (for example, a genuine smile vs. a faked one) does not require devising an algorithm to mechanically emulate the mechanisms that enable humans to distinguish between these cases. Instead, the neural network acts as a generalized learning machine, and the developer trains it to recognize meaningful differences via large numbers of examples.
One factor that's held back the use of deep neural networks is processing power. It takes a humungous amount of processing power to train deep neural networks, and quite a lot to run them once trained. Only very recently has this type of processing power become available at practical prices – including in embedded processors suitable for high-volume, cost-sensitive products.
Because deep neural networks are massively parallel structures, they are very suitable for acceleration using massively parallel architectures. And because they have simple, highly repetitive structures, they're also amenable to acceleration via specialized architectures. As result, I think we can expect rapid improvements in cost-performance and energy-efficiency of processors for neural network applications – far outstripping the modest gains enabled by advances in chip manufacturing.
This means that, very soon, developers of many types of systems, applications and devices will have the possibility of incorporating new types of intelligence into their products. But to do so, they'll need to understand how deep neural networks work, how to design them, and how to train them.
To address this need, I'm proud to be partnering with the developers of the popular Caffe open source deep learning framework to present a one-day tutorial on deep learning for computer vision. It will take place on February 22 in Santa Clara, California. For details about this unique event, please visit the tutorial web page.
I believe that the combination of deep learning and computer vision will create world-changing products and bring vast opportunities, and I'm eager to harness it. What do you think? Please share your thoughts by leaving a comment.
By Jeff Bier
Founder: Embedded Vision Alliance
President and Co-Founder: BDTI